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Abstract 

Typestate systems ensure many desirable properties of imperative 
programs, including initialization of object fields and correct use of 
stateful library interfaces. Abstract sets with cardinality constraints 
naturally generalize typestate properties: relationships between the 
typestates of objects can be expressed as subset and disjointness 
relations on sets, and elements of sets can be represented as sets 
of cardinality one. In addition, sets with cardinality constraints 
provide a natural language for specifying operations and invariants 
of data structures. 

Motivated by these program analysis applications, this paper 
presents new algorithms and new complexity results for constraints 
on sets and their cardinalities. We study several classes of con- 
straints and demonstrate a trade-off between their expressive power 
and their complexity. 

Our first result concerns a quantifier-free fragment of Boolean 
Algebra with Presburger Arithmetic. We give a nondeterministic 
polynomial-time algorithm for reducing the satisfiability of sets 
with symbolic cardinalities to constraints on constant cardinalities, 
and give a polynomial-space algorithm for the resulting problem. 
The best previously existing algorithm runs in exponential space 
and nondeterministic exponential time. 

In a quest for more efficient fragments, we identify several 
subclasses of sets with cardinality constraints whose satisfiabil- 
ity is NP-hard. Finally, we identify a class of constraints that has 
polynomial-time satisfiability and entailment problems and can 
serve as a foundation for efficient program analysis. We give a sys- 
tem of rewriting rules for enforcing certain consistency properties 
of these constraints and show how to extract complete information 
from constraints in normal form. This result implies the soundness 
and completeness of our algorithms. 



1. Introduction 

Program analyses that reason about deep semantic properties are of 
great value for software development; the value of such analyses 
is growing with the adoption of language constructs that eliminate 
low-level program errors. Many deep semantic properties are natu- 
rally expressible in fragments of set theory, so constraint solving for 
such fragments is of interest. This paper presents new algorithms 
and improved complexity bounds for fragments of set theory. The 
starting point of our constraints is the boolean algebra of finite (but 
unbounded) sets. 

Sets in program analysis. The boolean algebra of finite sets 
is a fragment of set theory that allows the basic set operations 
of intersection, union, and complement on sets of uninterpreted 
elements. Although simple, it turns out that this fragment can 



express many properties of interest in program analysis. Examples 
include typestate properties and public interfaces of data structures. 

Set specifications generalize typestate properties [29, 26]: the 
fact that an object o is in the typestate t is represented as the set 
membership of o in t. Through inclusion and disjointness con- 
straints, sets can also express relationships (such as hierarchy or 
orthogonality) between different typestates. Objects can be rep- 
resented as sets of cardinality one using a cardinality constraint 
\o\ = 1, so set membership reduces to subset. Multiple set member- 
ships can then encode constraints such as \t\ > k for any constant 
k. 

Sets can also provide natural abstractions of container data 
structures. When a content of a data structure is represented as an 
abstract set s, an operation such as insertion can be characterized 
by a postcondition s' — s U e where e is the set corresponding 
to the element being inserted. By expressing both typestates and 
data structure abstractions, sets can be used to combine the results 
of different analyses operating on the same program. Such an 
approach allows us to combine the scalability of typestate analysis 
with the precision of shape analysis and theorem proving [30, 28, 
27,46]. 

Sets with cardinality constraints. The use of the cardinality op- 
erator on sets leads to a connection between set algebra opera- 
tions and integer linear arithmetic, as evidenced, for example, in 
the condition \a U b\ = \a\ + |fo| for disjoint sets a and b. It is 
therefore natural to consider constraints that combine integer linear 
arithmetic with set algebra operations. These constraints constitute 
the Quantifier-Free Boolean Algebra with Presburger Arithmetic, 
or QFBAPA for short — they are the quantifier-free fragment of 
BAPA constraints whose decision procedure and complexity we 
have studied in [23, 22]. QFBAPA constraints can be used to ver- 
ify an invariant such as \a\ = |fe| which allows us to conclude that 
if a is nonempty, so is b, and therefore it is possible to call an op- 
eration that removes an element from b. Similarly, if i is an integer 
variable and s is a set, it is possible to verify an invariant \s\ — i 
stating that an integer i correctly maintains the size of the set s. 
In our experience, specialized decision procedures such as [22] are 
the only automated technique for deciding with non-trivial cardi- 
nality constraints. Currently, however, the complexity of these de- 
cision procedures limits their applicability. In this paper we give 
new algorithms for solving set cardinality constraints; these algo- 
rithms provide exponential improvements over existing approaches 
and make the checking of cardinality constraints in larger formulas 
more feasible. 

Our paper provides a systematic study of constraints on sets in 
the presence of cardinalities. We study both more expressive and 
less expressive fragments and demonstrate a trade-off between the 



On Algorithms and Complexity for Sets with Cardinality Constraints 



2005/8/3 



F ::= A \ F x A F 2 | Fi V F 2 \ ^F 

A ::= Bi = B 2 | Bi C B 2 \ T x = T 2 | Ti < T 2 | K dvd T 

B ::= s | | 1 | Bi UB 2 | Bi nB 2 | B c 

T ::= i | K \ Ti + T 2 | K T \ \B\ 

K ::= ... | -2 | -1 | | 1 | 2 | ... 

Figure 1. Quantifier-Free Formulas of Boolean Algebra with Pres- 
burger Arithmetic (QFBAPA) 



expressive power and the efficiency of the algorithms. The main 
contributions of our paper are the following: 

• PSPACE algorithm for QFBAPA. The best previously 
known algorithms for QFBAPA [23, 22, 45] execute in non- 
deterministic exponential time, and involve searching for an ex- 
ponentially large object. In this paper we first give a form of 
bounded model property that shows that it is possible to replace 
reasoning about symbolic cardinalities such as|a|=iA|a|=i 
where i is an integer variable, with guessing sufficiently large 
constant cardinalities, such as \a\ = 1000 A \b\ — 1000. More- 
over, we give a space-efficient algorithm for solving the result- 
ing constraints on sets with large constant cardinalities. This 
gives a PSPACE decision procedure for QFBAPA and is the 
first contribution of this paper. 

• A Polynomial-Time Class. Given that QFBAPA constraints 
are NP-hard, the question remains whether there are interest- 
ing fragments of sets with cardinalities which can be reasoned 
about in polynomial time. In a quest for such fragments, we 
identify several features of constraints, each of which leads to 
NP-hardness. By eliminating these features we have discovered 
a class (called i-trees) that has a polynomial-time satisfiability 
and entailment (subsumption) problems, while still supporting 
subset, union, disjointness, and arbitrarily large cardinality con- 
straints. This class can therefore express generalized typestate 
constraints such as multiple orthogonal classifications into inde- 
pendent or disjoint sets. The identification of this polynomial- 
time class, and the development of algorithms for testing the 
satisfiability and subsumption of constraints in this class is the 
second contribution of this paper. While the resulting algo- 
rithms are efficient, the proof of their completeness is some- 
what lengthy, and involves characterizations of normal forms 
of i-trees and the construction of models for i-trees in normal 
form. We therefore only summarize the main ideas; we refer 
the reader to the full version of the paper [32] for details. Addi- 
tional proofs are also included in the Appendix. 

We proceed by defining the fragment QFBAPA in Section 2. We 
present a PSPACE algorithm for QFBAPA in Section 3, defining 
the simpler CBAC constraints and identifying their NP-complete 
fragment, CBASC constraints. 

2. Constraints on Sets with Cardinalities 

Boolean Algebra with Presburger Arithmetic. Figure 1 presents 
the syntax of the constraints studied in this paper, we call these for- 
mulas Quantifier-Free Boolean Algebra with Presburger Arithmetic 
(QFBAPA). QFBAPA constraints contain two kinds of values: in- 
tegers and sets, each with corresponding applicable operations. The 
sets are interpreted as subsets of some arbitrarily large finite set. s 
denotes a set variable, i denotes an integer variable. The symbol 
\B\ denotes the cardinality of the set B and establishes the connec- 
tion between set and integer terms. MAXC is a special free variable 
denoting the size of the universal set. If b is a set, b c denotes its 
complement. K dvd T denotes that K divides T. K denotes con- 
stants, encoded in binary: a constant k is encoded using 0(log k) 



bits. The symbol A in Figure 1 denotes atomic formulas; a literal is 
an atomic formula or its negation. 

A quantified version of this language (BAPA) is studied in 
[23, 22]; where we give an algorithm that establishes a doubly ex- 
ponential space upper bound on the complexity. Because quanti- 
fied BAPA subsumes Presburger arithmetic, the doubly exponential 
nondeterministic time lower bound [15] applies to BAPA as well. 
Preliminaries. If S is a finite set, |S| denotes the number of 
elements in S. A literal is an atomic formula or its negation. Z = 
{. . . , — 1, 0, 1, . . .} is the set of integers, N = {0, 1, . . .} is the 
set of natural numbers. [a..b] denotes the set of integers {a, a + 
1, . . . , b}. If / : A —* B is a function and S C A, we define 
f[S] = {f (a) | a G S}. 

If A is a set, the notation A y has several potential meanings; 
the specific meaning should be clear from the context. A n for 
n G {1, 2, . . . , } is the set of vectors (ai, . . • , a n ) where a,j G A 
for 1 < j < n, and A m,n is the set of matrices [a pq ] with m 
rows and n columns with elements a pq for 1 < p < m and 
1 < q < n. The expression A c denotes the complement of the 
set A. If a G {0, 1}, then A a denotes A for a - 1 and A c for 
a = 0. 

The relation = denotes the equality of the values of metavari- 
ables denoting syntactic objects, so if /i and f 2 are formulas, then 
/i = f 2 means that they are the same formula. In the context of 
inclusion diagrams (Section 4), = will denote the semantic equiva- 
lence of diagrams (we use = to denote the equality of diagrams). 

3. A PSPACE Algorithm for Q F B A PA 

Verification conditions arising in program verification can often 
be expressed using quantifier-free formulas, so it is natural to ex- 
amine whether more efficient algorithms exist for QFBAPA con- 
straints. When applied to QFBAPA formulas, existing algorithms 
run in non-deterministic exponential time (NEXPTIME): the al- 
gorithm [45] requires nondeterministically guessing an exponen- 
tially large object, whereas the algorithm a from [22] produces an 
exponentially large quantifier-free Presburger arithmetic formula. 
The question arises whether there exist algorithms that avoid non- 
deterministically guessing exponentially large objects. We show 
that this is indeed the case. Namely, we first show that Presburger 
arithmetic formulas generated by the algorithm a from [22] can in 
fact be solved in deterministic exponential time. Our result reduces 
QFBAPA to a simpler system of CBAC constraints (shown in Fig- 
ure 3), then applies a theorem by Papadimitriou [36] in a novel 
way. This leads to a deterministic EXPTIME decision procedure 
for QFBAPA satisfiability, which is an improvement on previously 
existing algorithms. Nevertheless, the question arises whether it is 
possible to avoid the construction of a non-deterministically large 
system of equations. It turns out that this is indeed possible: we 
present an alternating polynomial-time (and therefore, PSPACE) 
algorithm for QFBAPA. Therefore, it is possible to solve QFBAPA 
using solvers for quantified boolean formulas [9, 48, 37], 

Figures 2 and 4 present our PSPACE algorithm for QFBAPA. 
The algorithm has two phases. 

In the first phase, the non-deterministic polynomial-time algo- 
rithm in Figure 2 reduces QFBAPA constraints to a simpler class 
of constraints. We call these simpler constraints Conjunctions of 
Boolean Algebra expressions with Cardinalities (CBAC). CBAC 
constraints have a very simple syntactic structure (see Figure 3), 
but capture the key difficulty in solving QFBAPA: the need to con- 
sider exponentially large cardinalities on exponentially many set 
partitions. 

In the second phase, the algorithm in Figure 4 checks the satis- 
fiability of CBAC in alternating polynomial time and therefore in 
polynomial space. The key insight behind our algorithm is that it 
is possible to use a divide and conquer approach to avoid explicitly 
representing all possible regions in the Venn diagram. 
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Let / be the input QFBAPA formula. 

1. Replace each Z- variable with a difference of two N- 
variables: 



C[ti,...,t„]->C[ii 



■«1, 



"n & nl 



2. Ensure that all set algebra expressions appear within 
cardinality constraints by normalizing with the following 
rules: 

C[bi = b 2 ]^C[b 1 C& 2 Ab 2 Cbi] 

C[bi C6 2 ]^C[|&in&i|=0] 

3. Eliminate divisibility constraints: 

C[k dvd t] — > C[ki — t], i is fresh N-variable 

4. Move all cardinality constraints to top level: 

C[|6 1 |,...,|M]^/iA/ 2 

def 

where /i = C7[ii, . . . ,i mi ] 

def m l 

/ 2 = |1|=MAXC A A \bj\=ij 

3 = 1 

and ii , . . . , i mi are fresh N-variables 

5. Let p be a propositional formula such that 



p(ai 



/i for atomic formulas oi , 



Nondeterministically select the truth value oij G {0, 1} 
for each atomic formula dj, so that p(oi, . . . , dm ) is 

de/ "jo a . 
true. Let fu = A a / • 

3 = 1 

6. For each conjunct -n(£i=£ 2 ) in /n, non- 
deterministically replace the conjunct with one of 
the conjuncts (ti + 1 < i 2 ) or (i 2 + 1 < ii). 

7. Transform linear integer constraints to normal form: 

C[-.(ii <t 2 )]-^C[t2 + l<tl] 

C*[ii<i 2 ] -»C[ti -t 2 + i = 0] 

C[tl=t 2 ] -qE; =1 ^i;=fc] 

8. Let no be the number of integer variables in the entire 
formula. The resulting system is of the form: 

Av = dA A™ 1 ! \bj \=% Vj 

where A G Z m °' n °, d G Z m °, and v = (ii,...,i„ ) 
where each ij is a variable ranging over Z and 1 < 
Pi, ■ ■ ■ ,Pmi < wii are variables denoting cardinalities 
of sets. Let S be the total number of set variables in 
&i, . . . , 6mi- Let m = mo + mi, n = max(no, 2 ), 
and let M = n(ma) 2m+1 . 

9. Non-deterministically select a vector fc = (fci , . . . , k„ ) 
where kj G {0, 1, ... , M} for 1 < j < no, such that 
4A; = d. 



10. Call CBAC decision procedure on /\ \bj\ — k Pj . 

3 = 1 



If 



there exists a solution, then report the formula satisfiable. 

Figure 2. An NP Algorithm for Reducing QFBAPA Constraints 
to CBAC constraints of Figure 3 



F 
B 
K 



\B\=K | Fi A F 2 

s | | 1 | Bi U B 2 | Bi n B 2 | B c 

0|1|2|... 



Figure 3. Conjunctions of Boolean Algebra expressions with Car- 
dinalities (CBAC) 



Given a CBAC constraint 

m j 

J=l 

where the free set variables of 6i, . . . , b mi are among si, ■ ■ ■ , ss, 

run CBAC-check([], d) with d = (fci, . . . , k mi ). 

proc CBAC-check([t;i, . . . , v n ], d) returns result 
where vi, . . . ,v„, result G {0, 1}; d G N mi 
if (n < 5) then 

existentially choose do,d\ G N mi such that do + di = d; 
universally do 

n — CBAC-check([t;i, . . . , v„,0],do) and 
r 2 = CBAC-check([t>i, . . . ,v n , l],di); 
return n A r 2 ; 
else 

let pj =eval(bj, [si i-> Vi, . . . ,Ss !-» ^s]) 
for all (1 < j ; < nil); 
Jo = {dj |pj =0}; 
Ji = {dj \pj = 1}; 

return Jo C {0} A |Ji| < 1. 

proc eval(b, a) returns result 

where b : Boolean Algebra formula 
a : {si,...,ss} -» {0,1} 
result G {0,1} 
treating b as a propositional formula, 
return the value of b under assignment a. 

Figure 4. An Alternating Polynomial-Time (and PSPACE) Algo- 
rithm for Checking the Satisfiability of CBAC Constraints 



We next discuss our algorithm in more detail and argue that it is 
correct. We begin with the description of the steps of the algorithm 
in Figure 2, which reduces symbolic cardinalities to large constant 
cardinalities. 

1. Non-negative integers. To simplify the later steps, the first step 
makes all integer variables range over non-negative integers N, by 
replacing each integer variable i with a difference ii — i 2 of fresh 
non-negative integer variables ii,i 2 . 

2,3. Eliminating set equality and subset, and integer divisibility. 

The next step converts set equality and set subset into cardinality 
constraints. This step helps the later separation between the boolean 
algebra part and the integer linear arithmetic part. We then elimi- 
nate any divisibility relations using multiplication and a fresh vari- 
able. 

4. Flattening. The next step separates the formula into the 
boolean algebra part, denoted /i and the integer linear arithmetic 
part, denoted / 2 . This step simply amounts to naming the cardinal- 
ity of each set by a fresh integer variable. 

5,6. From quantifier-free formulas to conjunctions. An obvious 
source of NP-completeness of QFBAPA is the presence of arbitrary 
propositional combinations of atomic formulas. An effective way 
of dealing with propositional combinations is to enumerate the 
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satisfying assignments of the prepositional formula using a SAT 
solver, and then solve the conjunctions of literals [16, 17]. Steps 5 
and 6 of the non-deterministic algorithm in Figure 2 are an abstract 
description of such procedure. The goal of step 6 is to eliminate 
disequalities, which involve non-deterministic choice between the 
two inequalities. 

7. Normal form for integer constraints. The algorithm elimi- 
nates the remaining negations of atomic formulas and transforms 
linear constraints into normal form Av — d. 

8,9,10. Estimating sizes of integer variables. The resulting sys- 
tem contains linear integer equations of the form 2~2 n = i Cjij — k, 
and set cardinality constraints of the form |&| = i. The algorithm 
computes an upper bound M on integer variables in any poten- 
tial solution of the system, using several parameters: the number 
of conjuncts n, the number of integer variables no and the number 
of set variables S. The computation of the upper bound is based 
on an observation that the satisfiability of the conjunction of con- 
straints |6| — i can be reduced to the satisfiability of equations 
of the form 2~2 P =i h — *> where variables lj denote sizes of set 
partitions (regions in Venn diagram) whose union is the set b; this 
is a specialization of the idea in [22] to the case of quantifier-free 
formulas. 

Let si, . . . , ss be all set variables appearing in formula and 
consider a constraint |fe| = i. Consider all partitions H7=i s i' 
for Qj £ {0, 1}. For each such partition b p , introduce a fresh In- 
variable l p , which denotes the cardinality of cube b p . Then consider 
a constraint of the form |6| = i. Each set is a union of regions in the 
Venn diagram (by the disjunctive normal form theorem) so suppose 
that b — b pi U . . . U b Pa . Then replace the term \b\ — i with the 
X]° = i lp q = *• We use the term "CBAC linear equations" to denote 
a system of linear equations resulting from the constraints \b\ = i 
as described above. 

As a result, we obtain a system of mo + mi linear equations 
over non-negative integers, where mo equations have a polynomial 
number of variables, and nil equations (CBAC linear equations) 
have exponentially many variables. It is easy to see that there exists 
a surjective mapping of solutions of the original constraints on 
sets onto solutions of the resulting linear equations (the mapping 
computes the cardinality of each Venn diagram). Therefore, the 
original system is satisfiable if and only if the resulting equations 
are satisfiable. Moreover, we have the following fact. 

FACT 1 (Papadimitriou [36]). Let A be an m x n integer matrix 
and b an m-vector, both with entries from [—a. .a}. Then the system 
Ax = b has a solution in N m ;/ and only if it has a solution in 
[O..M] m where M = n(ma) 2m+1 . 

Fact 1 implies that the estimate M computed in step 8 of the algo- 
rithm in Figure 2 is a correct upper bound. Using this estimate, step 
9 of the algorithm non-deterministically guesses the values of all 
integer variables such that the original linear equations Ax — d are 
satisfied. All this computation can be performed in nondeterminis- 
tic polynomial time, and (unlike [22]), does not involve construct- 
ing explicitly a system with exponentially many equations. Having 
picked the values of integer variables, including the variables i on 
the right hand side of constraints \b\ = i, we obtain a conjunction 
of constraints of the form |6| — k where k is a constant whose 
binary representation has polynomially many bits — these are pre- 
cisely the CBAC constraints in Figure 3. We have therefore shown 
the following. 

LEMMA 1 . The algorithm in Figure 2 reduces in non-deterministic 
polynomial time the satisfiability of a Q FBAPA formula to the 
satisfiability of CBAC formulas. 

It remains to find an algorithm for CBAC constraints. 



A PSPACE algorithm for CBAC. One correct way to solve 
CBAC constraints is to solve the associated CBAC linear equa- 
tions. This system has exponentially many variables, each of which 
can take any value from [0..M]. Therefore, guessing the values of 
each of these variables can be done in non-deterministic exponen- 
tial time; similar approaches not based on equations also require 
guessing exponentially large objects [45]. Note, however, that there 
are only polynomially many CBAC linear equations. Using the idea 
of the proof [36, Corollary 1 ] , we can therefore show that a dynamic 
programming algorithm can be used to solve the system in polyno- 
mial time. In fact, we can use the dynamic programming algorithm 
from the proof of [36, Corollary 1]. Instead of fixing the size of the 
equations mi to be constant, we simply observe that mi is poly- 
nomial in the size of the input, whereas the number of variables 
is singly exponential. The bound M therefore yields a singly ex- 
ponential deterministic time dynamic programming algorithm for 
CBAC. While this is better than existing results, we show that an 
even better result is achievable. 

Clearly, any algorithm that explicitly constructs CBAC equa- 
tions will require at least exponential time and space. Our solution 
is therefore to adapt the dynamic programming algorithm to a di- 
vide and conquer approach that always represents the equations in 
terms of their original, polynomially sized, boolean algebra expres- 
sion. Such an algorithm runs in alternating polynomial time, con- 
suming polynomial space, and is presented in Figure 4. To see the 
idea of our PSPACE algorithm, consider the CBAC linear system 
of equations written in the vector form: JJ =1 a jlj = d where d, 
aj are vectors and lj are the variables for 1 < j ' < 2 P . The algo- 
rithm guesses the vectors do, d\ £ N m such that do + di — d, and 
recursively solves two equations: 



E 



do A 



E 



di 



j=2P- 



This algorithm creates an OR-AND tree whose search gives the 
answer to the original problem. A position in the tree is given by the 
prepositional assignment [v\ , . . . , v n ] to boolean variables. Each 
leaf in the tree is given by a complete assignment [vi, . . . , vs] 
to set variables. Note that we never need to explicitly maintain 
the system during the divide phase of the algorithm, it suffices to 
determine in the leaf case p = whether the coefficient aj is 
or 1. The algorithm does this by simply evaluating each Boolean 
algebra expression b for the assignment [vi , . . . , vs]. 

THEOREM 1 . The algorithm in Figure 4 checks the satisfiability of 
CBAC constraints in PSPACE. The algorithm given by Figures 2 
and 4 checks the satisfiability o/QFBAPA constraints in PSPACE. 

Theorem 1 improves the existing algorithms for QFBAPA from 
both a complexity theoretic and an implementation viewpoint. A 
deterministic realization of previous NEXPTIME algorithms runs 
in doubly exponential worst-case time and requires exponential 
space; a deterministic realization of our new algorithm runs in 
singly exponential time and consumes polynomial space. Previous 
algorithms would require running a constraint solver such as a SAT 
solver [47] on an exponentially large constraint; the new algorithm 
can be solved by running a quantified boolean algebra solver [48] 
on a polynomially large constraint. 

NP fragments of CBAC. We have seen that both CBAC and 
QFBAPA constraints are in PSPACE. Both of these classes of con- 
straints are NP-hard, because the constraint |&| = 1 is satisfiable iff 
b is corresponds to a satisfiable prepositional formula. Moreover, 
Lemma 1 shows that QFBAPA constraints are in NP iff CBAC con- 
straints are in NP. For some subclasses of CBAC constraints we can 
indeed show membership in NP. Define conjunctions of boolean al- 
gebra expressions with small cardinalities, denoted CBASC, to be 
the same as CBAC but with constant integers encoded in unary 
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notation, where an integer x is represented in space O(x) as op- 
posed to 0(log:r); such encoding can therefore be exponentially 
less compact. 

LEMMA 2. The satisfiability of CBASC constraints is NP- 
complete. 

CBASC solutions are NP-hard because \b\ — 1 is a CBASC con- 
straint. One way to prove membership in NP is to observe that 
CBASC is subsumed by the language of set- valued fields which 
was proven to be in NP [24, 25] by reduction to the universal class 
of first-order logic formulas, which has the small model property 
[7, Page 258]. Another way is to consider the notion of sparse solu- 
tions of CBAC linear equations. An M-sparse solution is a solution 
to CBAC linear constraints with at most M non-zero elements. An 
M-sparse solution to CBAC linear constraints with 2 variables 
can be encoded as an M-tuple of pairs ( [vi , . . . , Vs] , k) where the 
prepositional assignment [vi, . . . , Vs]', encodes one of the 2 in- 
teger variables, and k specifies the value of that integer variable. 
This encoding is polynomial in MSw where w is the number of 
bits for representing the largest component of the solution. For any 
CBAC linear constraint A^Li IM = ^j> eacn solution is M-sparse 
where M — max(fci, . . . , k m ). For CBASC solutions, M is poly- 
nomial in the size of the CBASC representation because each ki is 
encoded in unary, so sparse solutions can be guessed in polynomial 
time. This proves that CBASC constraints are in NP. 1 

4. Inclusion Diagrams 

This section introduces inclusion diagrams (i-diagrams), a graph 
representation of CBAC constraints. Figure 5 shows a formula with 
sets and cardinalities and an equivalent i-diagram. I-diagrams allow 
us to naturally describe fragments of CBAC constraints and the al- 
gorithms for checking satisfiability and subsumption of these frag- 
ments. The basic idea of i-diagrams is to represent the subset partial 
order using a graph where sets are annotated with cardinalities, and 
then indicate the disjointness and union relations by constraints on 
direct subsets of a set. To efficiently represent equal sets, the nodes 
in the i-diagram stand not for set names, but for collections of set 
names that are guaranteed to be equal. Finally, we associate un- 
interpreted predicates with collections of nodes, representing the 
fact that elements of given sets satisfy the properties given by the 
predicate. The uninterpreted predicates illustrate a way to combine 
i-diagram representations with other constraints. 

DEFINITION 1 (i-diagrams). We fix a finite set SN of Set-Names, 
and a finite set PN of predicate names. We denote by PN the set 
of atoms {+P, -P\P G PN}. 

An i-diagram (Inclusion-Diagram) is either the null-diagram J_d or 
a tuple (§, 0d, Sons, Split, Comp, Clnf, CSup, $) such that: 

• § C 'P(SN) is a partition o/SN containing (nonempty) equiva- 
lence classes of set names that are guaranteed to be equal, with 
0d G § the equivalence class corresponding to names of sets 
whose interpretation is the empty set 0; 

• Sons : § — ► ■p(S) represents subset relation; 

we define S ~* S' <=> S G Sons(S'); then (§, ~+) is a 
graph, so we call elements of § nodes, and the elements of -w 
edges; we write ~*» for the transitive closure of~-*; 

• Split, Comp : § — ► V(PC§)) represent disjointness and com- 
pleteness of set inclusions; if S is a node, then Split(S') is a set 
of split views, where each view is a nonempty set of sons that 




1 Sparse solutions are interesting for general CBAC constraints as well. As 
of yet we have no example of a CBAC constraint whose associated CBAC 
equation system is satisfiable but has no sparse solutions; moreover, we can 
generalize the notion of sparse solutions to solutions representable using 
binary decision diagrams [8] while preserving polynomial-time verifiability. 



T> is such that Clnf({si}) = 1 , CSup({si}) = 5, Sons({si}) — 
{{S5, s 6 }, {s 4 }, {S3}}, Comp({si}) = {{{s 5 , s 6 }, {s 4 }, {s 3 }}} 

Split({*l}) = {{{S5,S6}},{{*4},W}}, $({si}) = {+P} 

and is equivalent to 

S2 = A S 5 = Sq A 

52 U s 3 U Sa U Ss C si A Si C S5 A S3 C S2 A s 5 C s 4 

53 Pi 54 = A Si C S3 U S4 U S5 A S4 C S5 

1 < |si| < 5A|s 4 | = 1 A|s 5 | < 3A|s 3 | < 2 A 
\/x G Si. P{x) A V16 s 3 . -iQ(x) 

Figure 5. An example i-diagram T> and an equivalent formula 



represent pairwise disjoint sets, and Comp(S') is a set of com- 
plete views each of which is a set of nodes that represent sets 
whose union is equal to the father; we require 

USplit(S) =Sons(S') 
IJComp(S') CSons(S') 

for all S G §; 

• Clnf, CSup : § —* N specify lower and upper bounds on the 
cardinality of sets; 

• $ : § — ► ^(PN ) maps nodes to the uninterpreted unary 
predicates and their negations that are true for all sets of a 
node. 

To avoid confusion between set names, nodes (sets of set names), 
and views (sets of nodes), we use lowercase letters s,Si,s' to 
denote set names, uppercase letters S,Si,S' to denote nodes, 
and letters Q,C to denote views and sets of nodes in gen- 
eral. When T> 7^-Ld is a diagram, unless otherwise stated, we 
name its components S, 0d, Sons, Split, Comp, Clnf, CSup, $, 
and similarly we name the components of T>' as 
§', 0' d , Sons', Split', Comp', Clnf, CSup', $'. 

In a graphical representation of an i-diagram, we represent 
each element S G § where S = {si,...,s n } using underly- 
ing sets {si, . . . , s n }. We represent inclusion Si -^ S2 by an 
arrow from Si to S2. We represent a split view Q G Split(S') 
where Q = {Si, . . . ,S n } with a circle connected with undirected 
edges to Si, . . . ,S„ and an arrow leading to S. We represent a 
complete view similarly, using a filled square instead of a circle. 
For each node S G § we indicate its cardinality bounds by anno- 
tating the node with [a..b] where a = Clnf(S), b = CSup(S'). 
We represent H(5 r ) — {±Pi, • • • , ±-P n } by annotating S with 
±Pi, . . . , ±P„. We represent d = {si, . . . , s„} by annotating 
the node {si, . . . , s n } with d . 

DEFINITION 2 (Semantics of i-diagrams). An interpretation o/SN 
and PN is a triple (A, a, S) where 

• A is a finite set (the universe); 

• a : SN — ► 'P(A) specifies the values of sets; 

• 3 : PN — » V(A) specifies the values of unary predicates; 

An interpretation I is a model for an i-diagram T>, denoted I \=T>, 
iffVs G fyd.a{s) = 0, and for all S G § where S — {si, . . . , s n }, 
the following conditions hold: 
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a(si) 



a{s n ); 



de f 

accordingly, define a(S) = a(si) — . . . — a(s n ) 

• Clnf(S) < \a(S)\ < CSup(S) 

• VP. (+P) G $(S) => a(S) C H(P) 

• VP. (-P) G $(S) => a(S) C H(P) C 

• VS' GSons(S).a(S') C«(S) 

• VQeSplit(S).VSi,S 2 6Q.S1 ^S 2 =* a(Si)na(S 2 ) = 
. VQ G Comp(S). a(S) C U SieQ «(Si) 

We K.se ffce standard notions of satisfiability, subsumption (entail- 
ment), and equivalence: 



T> is satisfiable 

v = v 



31. I\=V 

VP I (= P' => J (= P 
V (= P A P |= P' 



DEFINITION 3 (Explicit Disjointness). 

Wfe wWte disj-p o (Si , S 2 ) c<W a shorthand for 

Si ± S-2 A 3Q G Sons(So). Si,S 2 e Q 

and we say that Si , S 2 Are explicitly disjoint, and we write 
disji(5i, &)<if 

3Si, S 2 , So G S, Si A si a S 2 A S' 2 A disj 0iSo (5i, S 2 ) 

LEMMA 3. I-diagrams have the same expressive power as CBAC 
constraints. 

By "same expressive power" we here mean that there is a natural 
pair of mappings between the models of i-diagrams and solutions 
to CBAC constraints. 

Because nodes in i-diagrams are collections of set names, we 
can define the following operations. 

DEFINITION 4 (Factor-i-diagram). Let p C § x § be an equiv- 
alence relation on nodes. We define D/p as follows. Define 
-i-d/p =J-d. LetV = (§, d , Sons, Split, Comp, Clnf, CSup, $). 
We define V/p = V = (§', Sons', Split', Comp', Clnf, CSup', 
$ ) as follows. Define h so that if {Si, . . . , S n } is the equivalence 
class of S under p, then h(S) = Si U . . . U S„. IfQ^S, define 
h[Q] = {h(S) S G Q}. Then let §' = h[S], Consider S' G §'. 
Both § and § are partitions, and given S G § there is a unique 
set {Si, . . . , S„} C§ such that S' — Si U . . . U S„. Then define: 

Clnf (S') = max(Clnf(Si), . . . , Clnf(S„)) 
CSup'(S') = min(Clnf (Si), . . . , Clnf (S n )) 

Sons' (S') = fe[Sons(Si) U . . . U Sons(S„)] 
*'(S') = $(Si)U...U$(S n ) 

Split' (S') = {h[Q} I Q G Split(Si) U . . . U Split(S„)} 
Comp'(S') = {h[Q\ I Q G Comp(Si) U . . . U Comp(S„)} 

DEFINITION 5 (Merge). For any i-diagram P we define the i- 

de f 

diagram P[Merge(Q)] — T>/p for the equivalence relation p = 
{(Si,S 2 ) |Si,S 2 GQ}u{(S,S) |Sg§} 

In the sequel we impose the following restrictions on the form 
of i-diagrams. 

DEFINITION 6 (Simple Diagrams). A diagram is P is simple iff 
P = 0d or all of the following conditions hold for all S G §.' 

a) (§, ~+) has no cycles, in particular S Sons(S) 

b) 0© $ Sons(S) 

c) £ Split(S) A Comp(S) 

d) \/Q,Q'. Q G Split(S) A Q' C Q => Q' Split(S) 

e) VQ,Q'. G Comp(S) A Q' D Q => Q' g Comp(S) 

f) CSup(0 d ) = 0, Sons(0 d )=$(0 d ) = 



proc Simplify(P) : 

1. use fixpoint iteration to compute p as 
the smallest equivalence relation such that: 

1.1. Si A S 2 A S 2 ** Si => (Si,S 2 ) G p 

1.2. (S,0 d )G / oASiAs^(Si,0 d ) £ P 

1.3. 0GComp(S)^(S,0 d ) Gp 

1.4. dish,, So (Si,S 2 ) A (Si,S 2 ) G p^ (S 2 ,0 d ) G p 

1.5. disj B>5o (Si,S 2 ) A (So, Si) G P ^ (S 2 ,0 d ) G p 

1.6. {Si} G Comp(S) => (S,Si) G p 

2.P :=P/p 

Split(S) ^{Q - {0 d }|Q G Split(S),S £ Q} 
Comp(S)^{Q - {0 d }|Q G Comp(S), S £ Q} 
Sons(S) ^Sons(S)-{0 d ,S} 



ses 



Split(S) <- Split(S) - {0} 

-{Q| 3Q'GSplit(S),Q'DQ} 
Comp(S) <- Comp(S) - {0} 

-{Q\ 3Q'GComp(S),Q'CQ} 

' CSup(0 d ) ^0 ~ 

$(0d) «- 
Sons(0 d ) ^0 
Comp(0 d ) <- 
^ Split(0d) ^0 
6. return P 

Where [a <— b] denotes the result of updating the component a of 
i-diagram P with value b. 

Figure 6. Polynomial-time algorithm Simplify to compute an 
equivalent simple i-diagram 



Simplicity eliminates redundancy from diagrams, but does not re- 
strict their expressive power, as the following lemma shows. 

LEMMA 4. For every i-diagram P we can obtain an equivalent 
simple i-diagram using the polynomial-time algorithm Simplify 
in Figure 6. 



5. Sources of NP Hardness and Definition of 
I-Trees 

The satisfiability of i-diagrams is NP-hard because i-diagrams have 
the same expressive power as CBAC constraints. We have observed 
that the general directed acyclic graph structure of i-diagrams al- 
lows us to encode NP-complete problems; this motives the follow- 
ing two restrictions. 

Definition 7. 

An i-diagram P is tree shaped iff 

(§, *<*) is a tree (with an additional isolated node d ) 

An i-diagram P has independent views iff 

for all Q\,Qi G Split(S) U Comp(S) at least one one of the 

following two conditions holds: 

• QinQ 2 = 

• Qi G Split(S) A Q 2 G Comp(S) A Qi C Q 2 . 

Recall that, by Lemma 4, it suffices to consider i-diagrams with 
acyclic graphs of the subset relation. The tree shape condition is 
then a natural next restriction on the structure of i-diagrams. How- 
ever, due to the presence of Split and Comp, the tree shape condi- 
tion by itself does not reduce the expressive power of i-diagrams, 
and further restrictions are necessary. The independent views con- 
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dition extends the tree condition to the entire graphical representa- 
tion of i-diagrams, including the circles and squares that represent 
Split and Comp views. The conjunction of these two conditions 
can be expressed by saying that the graphical representation of i- 
diagram is a tree. 

REMARK 1. We can express the combination of the conditions: 
being simple, being tree shaped, and having independent views by 
saying that there are only four kinds of edges in the corresponding 
graphical representation: 2 

• from an element S € §— {0d} to a circle 

• from a circle to a square, indicating that all nodes of a split view 
belong to a complete view 

• from a circle to an element S G §— {0d}, 

• from a square to an element S G §— {0d}. 

Unfortunately, the restrictions on tree shape and independent views 
are not sufficient to guarantee a polynomial-time decision proce- 
dure in the presence of predicates associated with nodes. The rea- 
son is that the ability to encode disjointness of arbitrary sets leads to 
NP-hardness, yet even with tree structure and independent views it 
is possible to assert that two arbitrary sets Si and S2 are disjoint by 
letting (+P) G <fr(Si) and (— P) G $(52) for some uninterpreted 
predicate P. A simple way to avoid this problem is to require that 
<f> contains only positive atoms (+-P). A more flexible restriction 
is the following. 

DEFINITION 8. An i-diagram T> has independent signatures iff 
for every pair of distinct nodes Si, S2 such that (— P) £ <I?(Si) 
and(+P) G $(5 I 2) for some P G PN, at least one of the following 
two conditions holds: 

1. Si and S2 are explicitly disjoint, that is, disjJ,(Si, S2) 

2. Si and S2 have compatible signatures, that is, there exists a 
node S such that 

Si A S A S 2 * S A 

Sig(Si)nSig(5 , 2)csi g (s) 

where Sig(S) = {P \ (+P) G $(S) V (-P) G $(S)}. 

The independent signatures condition ensures that any disjointness 
conditions are either 1) a result of the fact that the ancestors of 
Si and S2 are explicitly stated as disjoint, or 2) a result of a 
contradictory predicate assignment (the case when Si and S2 have 
compatible signatures, so there exists a parent that resolves which 
of (+P) or (-P) hold for both Si and S 2 ). 

The discussion above leads to the definition of i-trees, for which 
we will give polynomial-time algorithms for satisfiability and sub- 
sumption in Sections 6 and 7. 

DEFINITION 9 (i-trees, iT). An i-tree T is a simple i-diagram 
such that T =J_d or such that all of the following three conditions 
hold: 

1. T is tree shaped 

2. T has independent views 

3. T has independent signatures. 

We denote by iT the set of i-trees. 

The following theorem justifies why all three conditions in our def- 
inition of i-trees are necessary. Its proof is based on a reduction 
from graph 3-colorability, which can be encoded using slightly dif- 
ferent i-diagrams for each of the three cases. The common property 
of these diagrams is that they can encode disjointness of arbitrary 
pairs of nodes. 



THEOREM 2. Omitting any one out of three conditions from Defi- 
nition 9 yields a class of diagrams whose satisfiability is NP-hard. 

We note that in addition to NP-hardness, the omission of tree 
shaped or independent views properties in fact retains the full 
expressive power of CBAC constraints, using a similar argument 
as in Lemma 3. 

Our ability to specify i-trees as a natural subclass of i-diagrams 
justifies the definition of i-diagrams themselves. For example, the 
definition of i-trees would have been more complex had we chosen 
to represent disjointness using a binary relation si f~) S2 — 0. 

Let us also observe that, despite the imposed restrictions, i- 
trees are fairly expressive. In particular they can express hierar- 
chical decomposition of a set given by a node S into disjoint sets 
Si,..., S„, by letting {Si,...,S„} G Split(S) n Comp(S). De- 
spite the independent view condition, we can have multiple orthog- 
onal decompositions, so {Si, . . . , S' m } G Split(S) nComp(S) for 
{Si , . . . , S' m }n {Si , . . . , S n } = 0. This allows i-trees to naturally 
express generalized typestate constraints. 

6. Deciding the Satisfiability of I-Trees 

In this section we prove that the satisfiability of i-trees is decidable 
in polynomial time. For this purpose we introduce a set of weak 
consistency conditions d (Definition 10) such that: 

(6.1) We can enforce weak consistency for any satisfiable i-tree us- 
ing a rewriting system TV (Definition 11) with the following 
properties (Lemma 5): 

• TV is semantic-preserving; 

• if a non-J_d i-tree is in TV normal form, then it satisfies 
weak consistency conditions; 

• for a particular strategy (Figure 9) the system TV termi- 
nates in polynomial time. 

(6.2) Every i-tree that satisfies weak consistency conditions is satisfi- 
able; Lemma 6 gives an algorithm for constructing a model for 
any i-tree that satisfies weak consistency conditions. 

Figure 9 summarizes the polynomial-time satisfiability decision 
procedure whose correctness (Theorem 3) follows from the results 
of this section. 

DEFINITION 10 (Weak Consistency). An i-tree satisfies weak 
consistency iff T 7^-Ld and T satisfies the following conditions 
for all SG§: 

VS' G Sons(S). $(S') D $(S) (&) 

CSup(S) > => VP G PN. {+P, -P} <Z $(S) (C 2 ) 

VQ G Comp(S). CSup(S) < S(CSup[Q]) (C 3 ) 

VQ G Split(S). Clnf(S) > £(Clnf[Q]) (C 4 ) 

Clnf(S) < CSup(S) (C s ) 

6.1 A Rewriting System TV for Enforcing Weak Consistency 

We introduce the following rewriting system to enforce weak con- 
sistency properties when possible. 

Definition 11 (System TV). For each tuple (k, name, 
condition, effect) in Figure 7, we define a rewriting rule on 
i-diagrams by 

&> (V =£± d A condition A V = ©[effect]) 



spot 



name 



for each assignment spot of the free variables appearing in the 
condition column. We define Rk by 

def 



V — >V' 



2 As a result, we can recognize this structure in linear time using, for 
example, a tree-automaton [12]. 



We define TV as union of - 



3spot. V^ 

narr 

> for 1 < j < 5. 



v 
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k 


name 


conditions 


effect 


1 


DnPhi 


ai) S G Sons(S') 

6i) &, = *(S) U $(£') 

ci) 0„ g *(S) 


$(S*)^0„ 


2 


Unsat 


02) {+P, -i>} C *(5) 

62) n = 

C2) CSup(S') > n 


CSup(S') <— n 


3 


UpSup 


a 3 ) Q G Comp(S') 
6 3 )n = E(CSup[Q]) 
C3) CSup(S') > n 


CSup(S)<-n 


4 


Uplnf 


a 4 ) Q G Split(S) 
6 4 )n = E(Clnf[Q]) 
c 4 ) Clnf(S) < n 


Clnf(S)<-n 


5 


Error 


a s ) Clnf(S) > CSup(S') 


Z>*-J-d 



Figure 7. System 72™ for ensuring weak consistency 
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C 
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c 
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D 


D 




D 






D 




[0..6] 
-P 


[0..6] 
+P-P 




[0..0] 
+P-P 






[0..0] 
+P-P 
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D 
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{£, 


D} 
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-Lrf 


Dn 


Phi XJ 


nsat 


UpS 


up 


Error 





Figure 8. An example sequence of rewriting steps for TV 



Figure 8 shows an example sequence of rewriting steps applied to 
an i-tree. 



LEMMA 5 (Properties of H w ). 
1. TV" is YT-stable, that is 

T G iT A T - 



• T" => 7" G iT 



2. 7?™ preserves the semantics, that is 

V > V' => P = V 

3. TV" enforces weak consistency when possible, that is, a dia- 
gram T> in TV normal form is either equal to .Ld or it is weakly 
consistent 

4. TV terminates in polynomial for the strategy corresponding to 
the algorithm RJJ[ F in Figure 9. 

Proof sketch. 

1. Follows easily from the fact that TV rules do not modify Sons, 
Split, Comp. 

2. Follows by construction of TV rules. Suppose T> — ► T>' . Then 

IV" 

T> \= T>' follows from conditions ai (l<i<5), and T>' \= T> 
follows from conditions d (1<«<4). 

3. For every k — 1..5, the condition of application of the rule Rk 
corresponds to the negation of Ct ■ When a diagram is in normal 
form for the rule Rk, it either satisfies Ck, or is J_d- 

4. To prove that RjJJ F corresponds to a polynomial strategy, we 
prove by induction that applying the rule Rk in the speci- 



P rocRK F (T) 

1. for every S G § from the root to the leaves 
for every Q G Comp(S) 

try to apply DnPhi(5, Q) to T 
2. for every S G § 

for every P G PN 

try to apply Unsat(S, P) to T 

3. for every S G § from the leaves to the root 

for every Q G Comp(S) 

try to apply UpSup(5, Q) to T 

4. for every S G § from the leaves to the root 

for every Q G Split(S') 

try to apply Uplnf (S, Q) to T 

5. for every S G S 

try to apply Error(S) to T 
return T 

proc ItreeSAT(T) 

if (Rn F (T) =_L d ) return satisflable 
else return unsatisfiablc 

Figure 9. Polynomial-time algorithms RJJp and ItreeSAT to 
compute TV normal form and check satisfiability of i-trees 



fied direction (from the root to the leaves or from the leaves 
to the root), enforces Ck everywhere, and when Ck holds, the 
rule is not applicable anymore. Finally, we prove that each 
rule Rk for k — 1..12 preserves the conjunction of proper- 
ties A,=i (fe-i) ^*> an d as a consequence, we never need to 



reapply any of the rules Rj for j < k.m 

6.2 Constructing Models for Weakly Consistent I-Trees 

The following Lemma 6 is crucial for the completeness of our 
algorithm, and justifies the definition of weak consistency. 

LEMMA 6 (Model Construction). If an i-tree T is weakly consis- 
tent, then we can construct a model for T. 

The high-level idea of the proof of Lemma 6 is to first build the first 
two components (A, a) of the model, and then extend the model 
with H using the independent signatures condition for i-trees. We 
build the (A, a) part of the model by building a model for each 
subtree using an induction on the height of the i-tree. To construct 
models that satisfy Split and Comp constraints in the inductive 
step, we use a stronger induction hypothesis: we show that there 
exists a model (A, a) for a tree rooted in node S with |A| = k 
for all Clnf(S') < k < CSup(S'), and we rely on the properties 
of weak consistency to prove the inductive step. The proof of this 
lemma is interesting because similar ideas are used when building 
example models that show the completeness in Section 7. 

Putting all results in this section together using the argument at 
the beginning of the section, we obtain the following theorem. 

THEOREM 3 (ItreeSAT Correctness). T is satisflable if and only 
i/WeakNF(T) ^_l_d- Therefore, the algorithm ItreeSAT in Fig- 
ure 9 is a sound and complete polynomial-time decision procedure 
for the satisfiability of i-trees. 

7. Deciding Subsumption of I-Trees 

The goal of this section is to prove that we can decide the subsump- 
tion of i-trees in polynomial time. Note that the subclass of i-trees 
is not closed under negation or implication, so we cannot decide 
T \= T' by checking the satisfiability of ->(T => T'). Instead, 
our approach is to bring T into a form where the properties of the 
models of T are easy to read from T. We then check that T en- 
tails each of the conditions that correspond to the semantics of T . 
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k 


name 


condition 


effect 


6 


Dnlnf 


a 6 )({S\WQ )eCo mp (S') 
be) n = Clnf(S") 

-E(CSup[Q ]) 
ce) n > Clnf(S) 


Clnf(S)<-n 


7 


DnSup 


a 7 )({S'}tuQo)GSplit(S') 
b T ) n = CSup(S") 

-E(Clnf[Q ]) 

c 7 ) n < CSup(S) 


CSup(5)<-n 


8 


CCmp* 


a 8 ) Q € Split(S) A 

CSup(5)<E(Clnf[Q]) 

b 8 )C„ = Comp(5')U{Q} 
c 8 ) C„ g Com p (SO 


Comp(S') ^C n 


9 


CSplit* 


09) Q £ Comp(S') A 

Clnf(5)>E(CSup[Q]) 
b 9 ) C n = Sp\\t(S)L){Q} 
cb) C„ g Split(S) 


Splits) <-c„ 


10 


UpPhi 


010) Q € Comp(S') 
6io)^» = *(5)un*[0] 

CIO) 0n g *(S) 


*(5) «- 0n 


11 


Void* 


ai 2 )S'^0 d ACSup(S)=O 


Merge({S,0 d }) 


12 


Equal* 


on) {5'} eComp(S) 


Merge({S,S'}) 



'Follow the application of these rules by Simplify. 
Figure 10. Rules for System TZ 



We formalize the intuitive condition of being easy to read in the 
notion of strong consistency. We build on the system TV" from the 
previous section to create a larger rewriting system TZ for ensuring 
strong consistency. We introduce a polynomial-time strategy for TZ 
that transforms every i-tree into _!_d or into an i-tree that is strongly 
consistent, and we give polynomial-time algorithms for extracting 
the information from strongly consistent i-trees. 

DEFINITION 12 (Strong Consistency). An i-tree T is strongly 
consistent iff it is weakly consistent and satisfies all of the following 
properties: 

VQeComp(5).V5o € Q. 

Clnf(So) > Clnf(S') - E(CSup[Q - {So}]) 



VQe Splits). VSo <EQ. 

CSup(So) < CSup(S) - E(Clnf[Q 

VQ G Split(S). Q Comp(5) => 
CSup(S) > E(Clnf [Q]) 

VQ e Comp(5). Q Split(S) ^ 
Clnf(S') < E(CSup[Q]) 



{So}}) 



(Ce) 
(Cr) 
(Cs) 

(C 9 ) 

(C10) 

(C11) 

(C12) 



vq e Comp(s). n(*[Q]) g HS) 

S / 0c ^ CSup(S') > 
Q G Comp(S) ^ |Q| > 1 
7.1 A rewriting system 1Z to enforce strong consistency 

This section follows the development of Section 6.1. 

DEFINITION 13 (System 1Z). The system H, extends TV with the 
additional rules of Figure 10, analogously to Definition 11. 

LEMMA 7 (Properties of 1Z). 1. 1Z is iT '-stable, that is 
T G iT A T — > 7" => T G iT 

■R 

2. 1Z preserves the semantics, that is 



proc Rnf(T) 

1 . . . 5. T <- WeakNF(T) 

6. for each S' G S from the root to the leaves 

for each Q G Comp(S) 
for every S G Q 

try Dnlnf (5, S",Q) 

7. for each S" G S from the root to the leaves 

for each Q G Split(S) 
for each S G Q 

tryDnSup(S,S",Q) 

8. for each S G § 

for each Q G Split(S) 
try CCmp(5, Q) 

9. for each S G § 

for each Q G Comp(S') 
try CSplit(S, Q) 

10. for each S G § from the leaves to the root 

for each Q G Comp(S') 
try UpPhi(S', Q) 

11. for each S G § from the leaves to the root 

try Void(S) 

12. for each S G S 

for each Q G Comp(S') 
try Equal^, Q) 
return T 

Figure 11. Polynomial-time algorithm Rmf(T) to compute 1Z 
normal form 



3. 1Z enforces strong consistency when possible, that is, a diagram 
D in 7Z normal form is either equal to _!_d or it is strongly 
consistent. 

4. 1Z terminates in polynomial time for the strategy corresponding 
to the algorithm Rnf described in Figure 11. 



V >V 



V = V' 



Proof sketch. 

1. The iT-stability is trivial for the rules Dnlnf, DnSup, 
UpPhi. The other rules are marked with a star and we use the 
algorithm Simplify. In fact, we can show that it is not necessary 
to apply Simplify in its full generality, but only to remove any 
redundant views introduced by CCmp and CSplit, remove 
any self edges introduced by the operation Merge used in the 
rules Equal and Void, and to remove the edges going to 0d 
that can be introduced by the rule Void. 
2,3. Follow by construction as in the previous section. 
4. This part is significantly more difficult than for system TV", 
because the interactions between the rules are more complex, 
but follows the same structure as the proof for TV" . 



7.2 Extracting Information from Strongly Consistent I-Trees 

In this section we start from a strongly consistent i-tree T and con- 
sider the problem of checking T \= T>' . Analyzing Definition 2, 
we observe that a diagram corresponds to a conjunction of con- 
straints. Therefore, the subsumption problem T \= T>' corresponds 
to the problem of verifying that T entails atomic formulas of the 
form $ — 0, si = $2, $1 g S2, a < \s\ < b, S g P, $ g P c , 
Si PI S2 = and s g U( s ii • • • > s «}- Without the danger of con- 
fusion, we write T \= A when the atomic formula A holds in all 
models for T. 

THEOREM 4. Let T be a strongly consistent i-tree and let H^ for 
atomic formula A be as defined in Figure 12. Then T \= A if and 
only ifV\A- 
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proc Subsumes(T, T>') 
T:=R NF (T) 

let / : SN -> § such that Vs G SN. s G /(s) 
let ft' : S' -*■ SN be any function such that VS" G S. ft'(S") G S" 
check all of the following conditions: 

i- A A hJ 1=S2 

ses' s 1 ,s 2 e.s 

where H;f 1=S2 <^> f(si) = /(s 2 ) 

2 - H h(0 d )=0 

where Hf_ ffl 4^4 /(«) = d 



's=0 
A H Clnf 

ses' 
where H 



(S)<|h(S)|<CSup'(S) 



r 
rP(h(S)) 



, a < |s| < 6 ^ Clnf(/(«)) < a < 6 < CSup(/«) 
A A hT 

ses' (+p)e$'(S) 
where H - ' 



+p(°) 

A A 

ses' (_p) 6 *'(s) 



(+P) G #(/(«)) 



n -P(h(S)) 



where H r p(s) <S4 (-P) G *(/(«)) 

A A h^ 



n h(S)Ch(S') 



ses' s'gSons'(S) 
where Hf lC . a ^4 /(«i) = ( ,V/(«i)«/W 
A A A H h(Si)nh(S 2 )=0 

ses' Qespiit'(s) SliS2eq 

Si^S 2 
jT de /, 



where H 



siHs 2 =0 



1 h(S)Cuh[Q] 



/(si)=0 d V/(s 2 )=0 d V 
dis&OSfi.Sj) 



A A i 

S6S' QgComp'(S) 

where Hjcuz &> /(*) = 0d V Included(/( S ), /[Z], T) 

where proc Included(So, C, T) 
return V Incl(5) 

So iS 

proc Incl(S) 

if 5 G C then return true 



else return V A Incl(S') 

QeComp(S) Ks'eQ 



Figure 12. An Algorithm for Computing T \= T>' for a an i-tree 
T and an arbitrary diagram T>' . 



It is easy to verify that H^ implies T \= A. The proof of the 
converse is based on the following two lemmas, which provide a 
link between strong and weak consistency. 

LEMMA 8 (Bounds Refinement). Let T be a strongly consistent i- 
tree, S G S, i,s such that Clnf(S') < i < s < CSup(S), let 
T = T[Clnf(S r ) <- i, CSup(5') <- s] and% f = Rft F (T'). Then 
1) ^nf 7^-Ld, 2) T^f \= T, and 3) if^(S ~~> So), then 

(Clnf(So),CSup(S )) = (Clnf' NF (5o),CSup' NF (S'o)). 

Proof sketch. 

1 . We prove this result by induction on the depth of S in the tree 
(§, ~-»). The key step of this proof is to show that the application 
of UpSup and/or Uplnf to the father S' of S does not produce 
a situation where as holds in the resulting diagram T" (and 
therefore the rule Error is not applicable in T"). We use the fact 
that R}Jj F applies the rules Uplnf and UpSup bottom up, and 
prove that each application preserves Cs, only increases Clnf 



and only decreases CSup. At each step we distinguish three 
cases: 

(a) both UpSup and Uplnf are applicable; then the result fol- 
lows from Cs; 

(b) only UpSup is applicable; then the result follows from Co', 

(c) only Uplnf is applicable; then the result follows from Cj. 

2. Follows easily from the hypothesis Clnf(S') < i < s < 
CSup(S) and the fact that RJ5J F is semantics preserving. 

3. It is enough to notice that only rules Uplnf and UpSup are used 
when applying Rn F , and these rules are applied in the bottom- 
up direction. ■ 

The fact that the resulting i-tree 7^ F is not strongly consistent any- 
more prevents us to apply this lemma twice from a given strongly 
consistent i-tree. To enforce more than one restriction, we need to 
refine simultaneously the bounds of several nodes. For this purpose, 
we use the following lemma. 

LEMMA 9 (Parallel Bounds Refinement). Let T be a strongly con- 
sistent i-tree, and (Qo, ~~*) CI subtree ofT such that 

• The nodes ofQo are pairwise independent, that is, 

VS , 1 ,S 2 GOo.-(disj^(Si,S 2 )) 

• (Qo, ~*) has the same root as T. 

Then the i-tree T defined by the simultaneous update 

T' d = f T[ VS £ Q :C\nf(S)*-CSup(S) ] 
is such that its TV normal form 7~ NF = R^ F (T ) satisfies 

2- T N ' F |= T 

Lemmas 8 and 9 are the basic tools we need to show that 
the information syntactically computed from an i-tree is the most 
precise information computable from the semantics of the i-tree. 
We prove this property for each of the atomic formulas A. 

LEMMA 10. If an i-tree T is strongly consistent, then for all S G § 
we have 

S^® d ^-3M. M ^T /\a M (S)^% 

Proof. If S ^ d , we have CSup(S") > by Cn and therefore 

the i-tree T d = T[Clnf (S)<- max(l, Clnf (£))] subsumes T. By 
Lemma 8, T is satisfiable, and we can take any model of T as a 
model of T. ■ 

LEMMA 1 1 . If an i-tree T is strongly consistent, then for all S G § 
we have 

3M.M \=TA\a M (S)\ = Clnf (5) 
3M.M \=TA\a M (S)\ = CSup(S) 



Proof. According to Lemma 8, 



the 



two 



i-trees 



T( = / T[CSup(S)^CInf(S)] and T 2 ' = / T[Clnf(,S)^CSup(S')] 
are satisfiable, and both T[ and T% trivially subsume T. Any model 
Mi of T{ is such that |axi(S)[ = Clnf(S'), and any model M2 
of T2 is such that \aM 2 (S)\ — CSup(S). ■ 

Lemma 12. If an i-tree T is strongly consistent, So G S, C G 
V(S), and Included(S'o, C, T) returns false, then 

3M.M \=T A a M (S ) £\Ja M [C] 

Proof sketch. Assume that Included returns false. We argue 
that the model M. exists in several steps. Let Qo be the smallest 
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set of nodes such that: 



So As ^SgQ„ 
S eQoAde Comp(S)\ 
A Si 6 Ci A -Jncl(Si) J 



■ Si e Q ( , 



By definition of Incl, we have Qo n C = 0. 

Qo is tree-shaped by construction, but may contain two nodes 
which are explicitly disjoint. We therefore compute a subtree Q\ of 
Qo, by starting from the root and keeping at most one son for each 
complete view. In this process we ensure that Qi contains So, by 
avoiding to cut the branch which leads to So. 

We then apply Lemma 9 to Qi and construct a model of the 
resulting i-tree while enforcing that a certain element x G a(S) is 
such that x G a(S') <^> S' G Qi for all S' G S. More precisely, 
we prove by induction on n that for each node Si of Qi of depth n 
in the tree Q\ we can construct a model (A, a, H) for the sub-i-tree 
of T with root Si such that 

VS'. S' A Si =*- (xe a(S') <# S' G Qi) 

If n — then Si is a leaf of (Qi, ~~») and has no complete view by 
construction of Q\. Then using C$ we show that we can construct 
a model of the sub-i-tree with root Si containing a fresh element 
(not included in any of the sons of Si). 

If n > 0, we can deal with the split views in the same way, but 
this time Si can have some complete views. If this complete view 
contains a unique split view, we avoid merging a; in a son of Si with 
elements in the other sons of Si . If there exist more than one split 
view, we can use Cs and construct the model using a refinement of 
the ideas of Lemma 6. 

Finally, since So G Qi, we have x G a (So), and since 
C n Qi = we have x <£ \Ja[C\. ■ 

LEMMA 13. If an i-tree T is strongly consistent, for all Si, S2 G § 
such that Si 7^ and S2 7^ we have 

-(SiAs 2 )^3X. M \=TAa M (Si) £a M (S 2 ) 

Proof. The property 3M. M \= T A a M (Si) % a M (S 2 ) can 
be checked using Included(Si, {S2}, T). Using C12 we show 
that this test is equivalent to test Si ~* S2. ■ 

LEMMA 14. If an i-tree T is strongly consistent, for all Si, S2 G § 
we have 

Si/S 2 ^3X. M ^TAqai(Si)/qai(S2) 

Proof. If Si / S 2 , then -.(Si A S 2 ) or ^(S 2 A Si). In either 
case the result follows from Lemma 13. ■ 

LEMMA 15. If an i-tree T is strongly consistent, then for all S G § 
and P G PN we have 

(+P) <£ $(S) => 3M. M\=TAa M {S)% H(P) 
(-P) <£ $(S) => 3M. M\=T Aa M {S)% E(P) C 

Proof sketch. Let S G § be such that (+P) $(S). We define 

g p d g{ S > € S|(+P) G $(S')}. Using C10 and & we show that 
Included(S, Qp,T) returns false. By Lemma 12, there exists a 
model such that a(S) % \_}a\Qp\. We then change the model by 
redefining 5' on PN as H'(P) = U"[Q^]. so «( S ) 2 =-'{ p )- 
The case (— P) $(5) is dual and follows from the previous case 
by swapping (+P) and (— P) in the i-tree and taking complements 

0fS(P).B 

LEMMA 16. If an i-tree T is strongly consistent, then for all 
Si , S2 G § such that Si 7^ 0d, S2 7^ 0d we have 

-n(disj^(Si,S 2 ))^3X. 7W ^TAa>,(Si)naM(&)/fl. 



From the previously stated lemmas, we can prove Theorem 4. 
From Theorem 4 and Lemma 7 we conclude that the algorithm in 
Figure 12 is a correct and complete test for subsumption, not only 
of between trees, but also between a tree and an arbitrary diagram. 

8. Related Work 

Boolean algebras with cardinalities. Quantifier-free formulas of 
boolean algebra are NP-complete [33]. Quantified formulas of 
boolean algebra are in alternating exponential space with a lin- 
ear number of alternations [21]. Cardinality constraints naturally 
arise in quantifier elimination for boolean algebras [31, 42, 43]. 
Quantifier elimination implies that each first-order formula of the 
language of boolean algebras is equivalent to some quantifier-free 
formula with constant cardinalities; however, quantifier elimina- 
tion may introduce an exponential blowup. The first-order theory 
of boolean algebras of finite sets with symbolic cardinalities, or, 
equivalently, boolean algebras of sets with equicardinality operator 
is shown decidable in [14]. These results are repeated, motivated 
by constraint solving applications, in [23, 39] and a special case 
with quantification over elements only is presented in [44]. Upper 
and lower bounds on the complexity of this problem were shown in 
[22] which also introduces the name BAPA, for Boolean Algebra 
with Presburger Arithmetic. The quantifier-free case of BAPA was 
studied in [45] with an NEXPTIME decision procedure, which is 
also achieved as a special case of [23, 22], The new decision pro- 
cedure in the present paper improves this bound to PSPACE and 
gives insight into the problem by reducing it to boolean algebras 
with binary-encoded large cardinalities, and showing that it is not 
necessary to explicitly construct all set partitions. 

Several decidable fragments of set theory are studied in [10]. 
Cardinality constraints also occur in description logics [5] and 
two-variable logic with counting [35, 19, 38]. However, all logics 
of counting that we are aware of have complexity that is beyond 
PSPACE. 

We are not aware of any previously known fragments of boolean 
algebras of sets with cardinality constraints that have polynomial- 
time satisfiability or subsumption algorithms. Our polynomial-time 
result for i-trees is even more interesting in the light of the fact that 
our constraints can express some "disjunction-like" properties such 
as A = B U C. 

Set constraints. Set constraints [1, 3, 2, 6] are incomparable to 
the constraints studied in our paper. On the one hand, set constraints 
are interpreted over ground terms and contain operations that ap- 
ply a given free function symbol to each element of the set, which 
makes them suitable for encoding type inference [4] and interproce- 
dural analysis [20, 34]. Researchers have also explored the efficient 
computation of the subset relation for set constraints [13]. On the 
other hand, set constraints do not support cardinality operators that 
are useful in modelling databases [40, 1 1] and analysis of the sizes 
of data structures [29]. Tarskian constraints use uninterpreted func- 
tion symbols instead of free function symbols and have very high 
complexity [18]. 

9. Conclusions 

Constraints on sets and relations are very useful for analysis of soft- 
ware artifacts and their abstractions. Reasoning about sets and re- 
lations often involves reasoning about their sizes. For example, an 
integer field may be used to track the size of the set of objects stored 
in a data structure. In this paper, we have presented new complexity 
results and algorithms for solving constraints on boolean algebra 
of sets with symbolic and constant cardinality constraints. We have 
presented symbolic constraints and large constant constraints, gave 
more efficient algorithm for quantifier-free symbolic constraints, 
identified several sources of NP-hardness of constraints, and pre- 
sented a new class of constraints for which satisfiability and en- 
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tailment are solvable in polynomial time. We hope that our results 
will serve as concrete recipes and general guidance in the design of 
algorithms for constraint solving and program analysis. 
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A. Proofs 
A.l I-Diagrams 

Lemma 3 l-diagrams have the same expressive power as C B AC 
constraints. 

Proof. We translate an i-diagram into a CBAC constraint as fol- 
lows. As in Figure 2, we note that 61 = hi, 61 C b 2 can be ex- 
pressed in the form \b\ — 0, so we may assume that they are part of 
CBAC. Similarly, |6| < k can be expressed as b C s A \s\ — k 
for a fresh variable s, and |fc| > k can be expressed as s C 
b A \s\ — k. We translate J_d into e.g. |0| = 1. Next consider T> — 
(§, 0d, Sons, Split, Comp, Clnf, CSup, $). For each S G S, let 
??(S) G S be a representative set name. For each Si £ S \ {t)(S)} 
introduce conjunct Si = T)(S). Next, for each Si G Sons(S), in- 
troduce a conjunct Si C S. For each +P G <&(S), introduce con- 
junct SCP, and for each — P £ $(S) conjunct S C P c . Express 
the bounds using conjuncts |S| < CSup(S) and |S| > Clnf(S). 
For each Q G Split(S) and Si,S2 G Q where Si 7^ S2, intro- 
duce conjunct |Si PI S 2 | =0. For each {Si, . . . , S„} G Comp(S) 
introduce conjunct S = Si U . . . U S„. 

We translate a CBAC constraint into i-diagram using the follow- 
ing observations. It is sufficient to translate the following boolean 
algebra expressions: so = Si U S2, so — sj, and \s\ — k. We 
construct an i-diagram whose nodes are singletons. We pick one 
set variable u to act as a universal set and put {s} G Sons({u}) 
for every set variable s in the i-diagram. We translate so = Si U 
S2 as {{si},{s 2 }} G Comp({so}) and translate so = si as 
{{so},{si}} G Split({«}), {{s },{si}} G Comp({«}). We 
translate \s\ — k as Clnf({s}) = k and CSup({s}) = k. Then 
for each satisfiable assignment of CBAC there is a model for the 
constructed i-diagram where {u} is interpreted as a universal set. 
Conversely, for a model of i-diagram where a({s}) = A, we let 
[s 1— » A n a(u)]. The result is an assignment that satisfies the orig- 
inal CBAC formula. ■ 

Lemma 4 For every i-diagram T> we can obtain an equivalent 
simple i-diagram using the polynomial-time algorithm in Figure 6. 

Proof. We argue that algorithm in Figure 6 produces diagram that 
is i) well-formed, ii) simple, Hi) equivalent to the original diagram. 
We first observe that after step 2, the following two conditions 
hold: 

C) if Q £ Split(S) and S G Q, then Q C {S, d }. This condition 
holds because the step 1.5 of the algorithm merges all nodes 
Q \ {S} with d when S G Q G Split(S). 

D) if Q £ Comp(S), then Q / (by step 1.3) and if \Q\ = 1 then 
Q = {S} (by step 1.6). 

i) To see that the resulting i-diagram is well-formed, it suffices 
to check the conditions (J Split(S) = Sons(S) and (J Comp(S) C 
Sons(S). This condition is preserved by factor-diagram construc- 
tion (for any equivalence relation). It is preserved by step 3 for the 
following reason. The only nodes removed from Sons(S) are 0d 
and S. These nodes do not appear in (JComp(S) because 0d is 
removed from each view Q, and views with S G Q are removed. 
It remains to check that Sons(S) C (J Split(S) after step 3, and 
this holds because our condition (C) implies that no element other 
than S, 0d is lost from (J Split(S) in step 3. The well-formedness 
condition is preserved by step 4 because this step does not change 
(J Comp(S) or (J Split(S). Step 5 does not violate this condition 
either because it sets the components of 0d to 0. 

ii) To see that the resulting diagram is simple, we show that 
it satisfies conditions a),. . . ,f) of Definition 6. After step 2 of the 
algorithm, the resulting factor-diagram has no cycles of length 2 
or more, there are only potentially some self-cycles. These are 
eliminated in step 3 and no further edges are introduced. Hence, 
a) holds. For each of the following condition, they are enforced in 



certain step and not violated afterwards, according to the following 
table: 



b) 


c) 


d) 


e) 


/) 


3. 


4. 


4. 


4. 


5. 



1.2 5(Si) 

1.3 a(Si) 

1.4 a(Si) 

1.5 a(Si) 



= a(S 2 ) so a(S 2 ) = 0, or 
= a(S ),anda(S 2 ) C a{S ) 



Hi) We show that semantics is preserved when executing each 
sequence of steps 1, . . . , k for 2 < k < 5, that is, each step pre- 
serves the semantics provided that it is executed after the previous 
steps. 

k — 2. Each equality introduced into p is a semantic conse- 
quence of the diagram, because 

1.1 a(Si) C S 2 and q(S 2 ) C a(Si), 
C 5(0,,) = 0, 

c|J0 = 0, 

na(S 2 ) =0for5(Si) 
ID 5(S 2 ) =0,for5(Si) 
so again a(S 2 ) = 0, 
1.6 5(Si) C S and a(S) C UHSi)} = 5(Si). 

It follows that the condition on equality of sets, as well as the con- 
ditions on Clnf, CSup, Sons, $, Comp are all semantically equiv- 
alent when applied to the original and the factor diagram. The only 
semantic condition which can be lost in factor-diagram construc- 
tion is disj-p s (Si, S 2 ) when Si and S 2 nodes are merged, that is, 
when (Si, S 2 ) G p. However, in this case the disjointness condi- 
tion follows from 5(Si) = 0, which is enforced in 1.3. Therefore, 
for the particular relation constructed in step 1, factor-diagram is 
an equivalence preserving transformation. 

k = 3. We need to show that no information is lost by removing 
0d and S from the sons, as well as split and complete views of 
S. Clearly, removing S and 0d from Sons(S) does not change 
the subset conditions because C 5(S) and 5(S) C a(S). 
Eliminating 0d from Q G Comp(S) is justified because the view 
has the same semantics with or without 0<j. Dropping a view Q G 
Comp(S) for S G Q is justified because in that case a(S) C 
Us go "^("Si) holds trivially. Eliminating 0^ from Q G Split(S) 
is justified because intersection with empty set is always empty, 
so this condition does not bring any new information. Finally, 
dropping a Q G Split(S) with S G Q is justified because condition 
(C) implies that in such case Q C {0d, S} so the Split condition is 
trivial. 

k = 4. Removing {0d} from Split(S) preserves semantics 
because such view carries no information. Similarly, because all 
maximal views are preserved, removing their subsets does not 
change the semantics. For Comp(S), we consider two cases. In first 
case ^ Comp(S). In this case, removing does not have any 
effect, and it is sound to remove all non-minimal views because 
they are implied by the minimal views. The second case is G 
Comp(S). By condition D) on the step 1, we know that Q 7^ after 
the step 2, and the only node removed in step 3 is 0d, so it must have 
been the case that Q = {0d} after step 2. By condition D), we then 
have S = 0d- Because the semantic condition on Comp for Q — 
reduces to 5(S) = 0, this condition brings no new information, so 
we can remove it. 

k — 5. Because 5(0d) =, CSup(0d) = does not change 
semantics, similarly for $(0d) = 0. We also know that Sons(S) C 
{0d} because this condition is ensured by step 2 and is not violated 
afterwards. Because we have already observed that the diagram 
is well-formed, we conclude Comp(S) C {0d} and Split(S) C 
{0d}, so setting these values to does not change the semantics. ■ 

Theorem 2 Omitting any one out of three conditions from Defini- 
tion 9 (1. being tree-shaped, 2. having independent views, and 3. 
having independent signatures) yields a class of diagrams whose 
satisfiability is NP-hard. 

Proof. Suppose that at least one of the three conditions does not 
apply to a class of i-diagrams. We then give a reduction from the 
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problem 3COL to the satisfiability of i-diagrams in this class. Here 
3COL denotes the NP-complete problem of deciding, given an 
indirected graph, whether the graph can be colored using 3 colors 
such that adjacent nodes have different colors [41, Page 275]. 

Given a graph (N, E) where E C N x N is a symmetric ir- 
reflexive relation, we first build the i-diagram T> defined as follows: 
§ = N U {[/}, Clnf(C7) = CSup([7) = 3, Sons(£7) = N, 
Split(U) = {{n}\n G N}, and Comp([7) = $([/) = 0. For 
all n G N we let Clnf(n) = CSup(n) = 1 and let Sons(n) = 
Split(n) = Comp(n) = $(n) = 0. 

Each model of this diagram is a triple (A, a, S) such that 
|q((7)| = 3 and for all n, a(n) is a singleton included in a(U). 
If we consider a(U) as a set of three colors, then in each model 
with this property, a(n) indicates the color of node n. 

Then for each edge (ni,n2) G E we encode the fact that 
ni and ri2 must have different colors by enforcing the property 
a(ni) n a(ri2) = on the models of V. We encode this constraint 
in different ways depending on the class of i-diagrams: 

• If T> allows dependent signatures, we introduce a fresh pred- 
icate symbol P ni ,n 2 > ar, d add (+Pn 1 ,n 2 ) to $(m), and 
(-•Pm.na) to$(n 2 ). 

• If 2? allows dependent views, we add {n\ , 712} to Split([7). 

• If V allows multiple fathers, then we simulate depen- 
dent views by introducing a new node m(ni,n2). 
We let Clnf(m(ni,ri2)) = CSup(m(ni,ri2)) = 2, 
Sons(m(ni, 712)) = {m,ri2}, Split(rn(ni,Ti2)) = 
Comp(m(ni,Ti2)) = {{ni,ri2}}, and <J>(m(ni,7i2)) = 0. 
We then remove Tii,n2 from Sons((7), and add m(ni,Ti2) to 
Sons((7) instead. 

None of these constructions violates more than one of the three con- 
sidered restrictions. It is straightforward to verify that the diagram 
is satisfiable iff the graph is colorable, and that the construction of 
T> can be done in polynomial time. This proves that the satisfia- 
bility of i-diagrams with any of the three restrictions removed is 
NP-hard. ■ 

A.2 Termination of System 72 

LEMMA 17 (Invariants of 72.). For every k G [1-.12] the rule Rk 
preserves f\ . =1 , k _ 1 \ Cj or returns _L d . 



Proof. We analyze each rule Rk, for two i-trees T and T such 
that T — >T' , assuming that T satisfies A ._, ,,, ^C,, and, 

spot 



r ■ 
=l..(fc-l) ^1' 



more precisely, T — > T , where spot are variable names as they 

R k 

appear in the definition of 72. 

1. (DnPhi). Trivial. 

2. (Unsat). (Ci) does not depend on CSup. 

3. (UpSup). If n = 0, (C 2 ) is trivialy true for S in V. If 
n > 0, by (C3) CSup(S') > and we have already VP € 
PN.{+P, -P} g $(5) and since $' = $, (C 2 ) holds in T'. 

4. (Uplnf). Neither (Ci),(C 2 ) nor (C 3 ) depend on Clnf. 

5. (Error). T' =± d 

6. (Dnlnf). Only (C 4 ) and (C 5 ) depends on Clnf. 

• (C5) is maintained for S in T' because, noticing that (C5) is 
maintained for S' 7^ S 1 , we have 

Clnf'(S) = Clnf(S') - E(CSup(Q )) 

<CSup(5')-E(CSup(Qo)) (byC 5 ) 
<CSup(5) (byC 3 ) 

• To prove that (C4) is maintained in T' we need to check 
that (C4) is maintained for S, which is trivial by (ce), and 
that (d) is maintained for the father S' of S and the views 
Q 6 Split(S ,/ ) containing S. By property of independent 



views there exists only one such view Q = ({S} tbl Q' ) 
such that Q' C Qo and 

ECInf'(Q) = Clnf'(S) + ECInf(Q^) 

= (Clnf(5")-ECSup(Qo)) + SCInf(^) 

= Clnf(S')-SCSup(Q -Qo) 

+E s , eQ ,(Clnf(S ( ',)-CSu P (^)) 

<Clnf(,S")-ECSup(Qo-Qo) (byC 5 ) 

<Clnf(S") 

= Clnf'(S") 

7. (DnSup). Only (C 2 ), (C 3 ) and (C 5 ) depend on CSup. (C 2 ) is 
maintained thanks to (cr) as for the case of rule UpSup. 

• (C5) is maintained for S in T because, noticing that (C5) is 
maintained for S' / S we have 

CSup'(S') = CSup(S") - E(Clnf(Q )) 

>Clnf(S")-E(Clnf(Q )) (byC 5 ) 
> Clnf (5) (byC 4 ) 

• To prove that (C3) is maintained in T we need to check 
that (C 3 ) is maintained for S, which is trivial by (07), and 
that (C 3 ) is maintained for the father S' of S and the views 
Q G Comp(5") containing S. By property of independent 
views there exists only one such view Q — ({S} tbl Qo) and 

ECSup'(Q) = CSup'(5')+ECSup(Qo) 

= (CSup(S")-ECInf(Q„))+ECSup(S'o) 
= Clnf(5')+E Soe0o (CSup(5 ) - Clnf(5 )) 
>Clnf(S") (byC 5 ) 
= Clnf'(S") 

8. (CCmp). Only (C 3 ) and (Ca) depend on Comp. 

• (C 3 ) is maintained for S, Q because 

CSup'(S') = CSup(S') 

<E(Clnf(Q)) (by a 8 ) 
<E(CSup(Q)) (byC 5 ) 
= E(CSup'(Q)) 

• (Ce) is maintained for S, Q and all So G Q because 

Clnf'(5) = Clnf(S) 

<CSup(S) (byC 5 ) 

<E(Clnf(Q)) (by a 8 ) 

= Clnf(S , o) + E(Clnf(Q-{5 })) 

< Clnf(So) + E(CSup(Q - {So})) (by C 5 ) 
= Clnf'(So) + E(CSup'(Q - {So})) 

(Remark). If ever we use a simplification afterwards, as indi- 
cated by the star in figure 10, it can only consists in removing 
a complete view Q' such that Q' C Q. This operation trivially 
maintains /\ =1 (7 , Cj because in every properties of consis- 
tency where complete views appear they are universally quan- 
tified. 

9. (CSplit). Only (C 4 ) and (C 7 ) depend on Split. 

• (C4) is maintained for S, Q because 

Clnf'(S) = Clnf(S) 

>E(CSup(Q)) (bya 9 ) 
>E(Clnf(Q)) (byC 5 ) 
= E(Clnf'(Q)) 

• (C7) is maintained for S, Q and all So G Q because 

CSup'(S') = CSup(S) 

>Clnf(S) (byC 5 ) 

>E(CSup(Q)) (bya 9 ) 

= CSup(So) + E(CSup(Q - {So})) 
> CSup(So) + E(Clnf (Q - {So})) (by C 5 ) 
= CSup'(So) + E(Clnf'(Q - {So})) 
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(Remark). If ever we use a simplification afterwards, as indi- 
cated by the star in figure 10, it can only consists in remov- 
ing a split view Q' such that Q' C Q. This operation trivially 
maintains A,=i s ^j because in every properties of consistency 
where split views appear they are universally quantified. 
10. (UpPhi). Only (d) and (C 2 ) depend on $ 



(d) is maintained for S and all S' G Q because 



$'(s) = $(s)un<K<2) 

=$(5)u($(5")n $ (Q' 

C$(S)U$(S') 

C*(5") 

= *'(S") 



{5'}) 



(bybi 



(byCi 



• We can also prove that (C2) is maintained for S. Suppose 
there exists P in PN such that {+P, -P} G $'(S). Then 
we distinguish three cases. 

■ if {+P, -P} C $(5), by C 2 , CSup(S) = 

■ if {+P, -P} n $(S) = 0, then {+P, -P} G n *(Q)- 
Then for all S" £ Q, CSup(S') = by C 2 . Then by C 3 , 
CSup(S) = 

■ if {+P, -P} n $(5) = ±P for one atom ±P G 
{+P, — P}. Then the opposite atom ^P belong 
PI $(Q) and by d, ±P belong to f| $(Q). By C 2 , each 
node 5" G Q is such that CSup(S") = and by C 3 , 
CSup(S) = 

In the three cases CSup'(S) = CSup(S) = 0. 

11. (Void). If S is the root,the resulting i-tree T is such that 
§' = {SN} = {0 d } and d is trivial for all i G [1..11]. 
Otherwize, we have to check that removing the node S from 
the sons of the father of S (as indicated in the step 3 of the 
procedure simplify) maintains /\ =1 10 Cj. We denote by So 
the father of S and by Qo the split view of So containing S. If 
S is also contained in a complete view of So we denote by Co 
this complete view. 

• (d) and (C2) are trivially maintained. 

• (C3) is maintained because, if Co exists and Co = Co — {S} 
is not empty 

CSup'(So) = CSup(S ) 

<ECSup(C ) (byC 3 ) 
= ECSup(C ) (by 012) 
= ECSup'(C ) 

• (C4) is maintained because, if Q' = Qo — {S} is not empty 



Clnf'(So)=Clnf(S ) 
>ECInf(Q ) 
>ECInf(Q ) 
= ECInf'(Q ) 



( by C 4 ) 



• (C5) is maintained for the node $' d — S U 0d of T' because 
by (ai 2 ) and (C 5 ), Clnf(S) < CSup(S) = 0, by simplic- 
ity and (C B ), Clnf(0 d ) < CSup(0 d ) = 0, and therefore 
Clnf (Q' d ) = Maa;(Clnf(S),Clnf(0 d )) = < CSup'(0^). 

• (Cq) is maintained for So, Co and every Si G Co such that 
Si ^ S because 

Clnf'(Si) = Clnf(Si) 

>Clnf(So)-E(CSup(C -{Si})) (byC 6 ) 

= Clnf(So)-E(CSup(C -{S}-{Si}))(byoi2) 
= Clnf'(S )-E(CSup'(C -{Si})) 



• (C7) is maintained for So, Qo and every Si G Qo such that 
Si 7^ S because 

CSup'(Si) = CSup(Si) 

< CSup(So) - E(Clnf (Co - {Si})) (by C 7 ) 

< CSup(So) - E(Clnf (Co - {S} - {Si})) 
= CSup'(So) - E(Clnf'(C - {Si})) 

• (Cs) is maintained for Qo because 

CSup'(S ) = CSup(S ) 

>E(Clnf(Q )) (byCg) 

>E(Clnf(Q -{S„})) 

= S(Clnf'(Q )) 

• (Cq) is maintained for Co (if exists and Co = Co — {So} 7^ 
0) because 

Clnf'(So) = Clnf(S ) 

<E(CSup(Qo)) (by C 9 ) 

= E(CSup(Qo-{S })) (by an) 
= E(CSup'(Q )) 

• (C10) is maintained for Co (if exists and C = Co — {So} 7^ 
0) because 

n$'(c )=n < &(Co-{So}) 
cn$(c ) 

C$(S ) (byCio) 

= *'(S ) 

12. (Equal). Before to merge S and S' for {S'} G Comp(S) we 
have 

• Clnf (S) = Clnf (S') by C 4 and C 6 

• CSup(S) = CSup(S') byC 3 and C 7 

• $(S) = $(S') by Ci and do 

Therefore all the properties d for i — 1..10 are trivially 
maintained. 



A.3 Model Construction 

Lemma 6 (Model Construction) If an i-tree T is weakly consis- 
tent, then we can construct a model for T. 

Proof. Let T be a weakly consistent i-tree. 

We construct a model (A, a, H) by first constructing a partial 
model (A, a) for all parts of T except <J>, and then extending 
(A, a) with H to satisfy $. 

Constructing (A, a). We write (A, a) |= T to denote that A and 
a satisfy those conditions on S, Sons, Split, Comp, CSup, Clnf that 
do not mention $ in Definition 2. To show we can construct (A, a) 
such that (A, a) |= T we prove by induction on n the following 
more general claim. 

CLAIM 1 . For every i-tree T of height n with root Sr: 

VfcG [Clnf(Sij),CSup(Sfl)]. 

3(A,q). (A, a) |= V A \S(S R )\ = |A| = k 

If n — 1, the claim holds by C5 taking 

A = a(S R ) = {l,...,k} 

For n>l, consider an i-tree T with root Sr and k G 
[Clnf(Sij), CSup(Sjj)]. By examining the constraints in T, we 
choose the cardinalities for subtrees of T, use the induction hy- 
pothesis to construct models for subtrees, and paste the models for 
subtrees into a model for T. We decompose this process into three 
steps: 
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1. For each C G Comp(Si?), consider a subtree Tc built from T 
by removing all sons outside U C. We construct a model Mc 
for Tc of cardinality k. 

2. For each remaining Q G Split(S_f?) where Q 2 U Comp(Sij), 
consider a subtree 7q with the same root Sr but without the 
sons outside U Q. We construct a model Mq for 7c of cardi- 
nality k. 

3. Because the constructed models have the same cardinality, we 
can easily merge them to obtain a model for T. 

Step 1. Let C G Comp(Sfi), and C C Split(S H ) such that 

C* = UC. For each Q G C, let tq = / min(fc, E(CSup[Q])). Using 
k > Clnf (Sh) and C4 we can show 

E(Clnf[Q]) <tq< £(CSup[Q]) (HI) 

To each node S G Q we can therefore assign an integer K(S) G 
[Clnf(S), CSup(S)] such that tq = E(if [Q]). Let T s be the sub- 
i-tree of T rooted at S. By induction hypothesis, let Ms be the 
model of Ts of cardinality K(S). We can then take the disjoint 
union of these models to construct a model Mm q of size tq for 
the forest Usgo 1~s- 

For all Q G C, we have tQ < fc by definition of tq and fc. 
We can also prove that k < 'Eqec'tq- Indeed, if there exists a 
Qo G C such that tq — k, this is trivial. Otherwise, because T 
is weakly consistent, from C3 we can show that k < CSup(S) < 
E(CSup[C]) because 

E(CSup[C]) = E (£(CSup[Q])) = E (tq). 

qec qec 

We finally obtain 



max to < k < E tq 

Qec q e c 



(H2) 



Thanks to (H2), we can build a model for the i-tree Tc, as follows. 
We start with the disjoint union of models Mq for Tq for Q G C. 
This model has cardinality T^qectq- Then, we rename elements 
from different models to be identical to elements from other mod- 
els. Such merging is possible as long as there is no model whose 
domain contains the domains of all others, so we can reach any 
cardinality k for maxg e c < k. 

REMARK 2. (Freedom in the choice of {ii}ig[i.. n i) In Section 7 
we enforce some additional properties on models using a differ- 
ent choice of tq. Such construction is possible whenever ti satisfy 
(HI) and (H 2). Moreover, if K(S) denotes some chosen cardinal- 
ity for each node S, and the values K(S) satisfy certain assump- 
tions, then we can enforce additional properties when merging the 
models Mq corresponding for Q G Split(S). The following two 
cases are of interest. 

1. If 2~2 tq > K(S), we can chose any pair of different split 

Qec 
views Q\,Qi G C, and two elements x\ from Mq 1 and x 2 
from Mq 2 and decide to merge them. 

2. If C = {Qo} ttJ Co and max tq < K(S), we can chose any 

Qec 

element in the model Mq and decide not to merge it with any 
of the elements of the models Mqi for Q' G Co. 

Step 2. Let Q G Split (Sr), such that Q is not included in any 
complete view. We construct a model Mq of size k for the i-forest 
Tr by first building a model of size K(S') = Clnf(S') for each 
S" G Q. Because k > Clnf(S) > E(Clnf [Q]), by C 4 , the disjoint 
union of these models has cardinality smaller than k. By adding 
the correct number of fresh elements to A, we obtain a model of 
cardinality k. 

REMARK 3. (Existence of fresh elements) In Section 7 we use the 
following property: For all Q G Split(S) and Q UComp(S), 



if E(Clnf [Q]) < K(S), then there exists a model such that a(S) 
contains an element which does not belong to any a(S') for any of 
the sons of S. 

Step 3. We can apply an arbitrary bijection oq '■ Ac — * [l..n] 
to each model Mc constructed as previously described before to 
build a model for the entire i-tree. We let o(Sr) — [l..n] and for 
all S ^= Sr, a(S) = ac(S) where C is the view containing an 
ancestor of S in Split(Sjj) or Comp(Su). 

REMARK 4. (Freedom in the choice of a.) If we know that 
K(S) > 0, for any pair Si , S2 of sons of S such that Si , S2 belong 
neither to the same split view nor to the same complete view, for 
each choice of elements xi, £2 in the models Ts x and Ts 2 we can 

de f 

choose (Ji,cr2 such that a\(x\) = 02(2:2) = x and the resulting 
i-tree will be such that x G ~a(S\) l~l a(S > 2). 

Extending the model with 3. Let (A, a) \= T. Then for each 
P G PN, define 

3(P) = \J{a(S) \S G § A P G $(£)} 

Then (+P) G $(£) => a(S) C E(P) holds by construction, 
it remains to show (— P) G $(5) => a(S) C E(P) C for every 
node S. Consider Si G § such that (-P) G $(Si). If Si = d , 
then a(Si) = 0, so the condition trivially holds. Similarly, if 
(+P) G $(Si),thenbyC 2 ,CSup(Si) = 0soa(Si) = and the 
condition holds. Otherwise, assume (+P) ^ $(Si). For the sake 
of contradiction suppose that there exists an element x G a (Si), 
x G H(P). By definition of H(P), there exists a node S2 7^ Si 
such that x G a(S2) and (+P) G $(S2). By the condition on 
independent signatures, one of the followig two cases applies. 

1. disJ5-(Si, S2). Then q(Si) n a(S2) = by the semantics 
of i-diagrams, which is a contradiction with x G a (Si) and 
x ea(S 2 ). 

2. Si and S2 have compatiable signatures. Then there exists a 
node S such that Si A S, S 2 ■?* S and Sig(Si) n Sig(S 2 ) C 
Sig(S). Because (-P) G $(Si), (+P) G Sig(Si), and 
because (-P) G $(S 2 ), P G Sig(S 2 ). Therefore P G Sig(S). 
We have two cases: 

(a) (+P) G $(S). By Ci, then (+P) G $(Si), a contradic- 
tion. 

(b) (-P) G $(S). By C 2 , then (-P) G $(S 2 ). By C 2 then 
CSup(S2) = 0, so a(S2) = 0, a contradiction with x G 
a(S 2 ). 

We have reached the contradiction in each case, so we conclude 

a(Si)CH(P) c .. 

A.4 Details of the Proofs for Subsumption Completeness 

A.4.1 Refinemenents of Lemma 6 

According to the remarks in the proof of Lemma 6, if an i-tree T is 
weakly consistent, there exists a choice of cardinalities K : § —* N, 
such that we can build a model (A, a, 3) for T with the property 
|q(S)| = K(S) for all S £ §. For a fixed choice of cardinalities 
K, we can, in certain cases, enforce some additional properties 
by choosing which element we merge in the steps 1 and 3 of the 
construction. The three following Lemmas are based on this idea. 

LEMMA 18 (Non-empty intersection (1)). Let Si and S2 be nodes 
in a weakly consistent i-tree T is such that 

Clnf(Si) > A Si A Si A Si G Qi A 

Clnf(S 2 ) > A S 2 ^ S 2 A S 2 G Q2 

for some S,S[,S ! 2 G S, Qi,Q 2 G Split(S) where Q\ 7^ Q2 and 
-i(3C G Comp(S). Qi C C A Q2 C C). Then there exists a 
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model (A, a, H)/or T such that 

tt(Si)n«(S 2 )/|. 

Proof. We use the construction described in Lemma 6, with the 
exception of step 3 of the construction of the model Ms for the 
subtree Ts with root S, where we do the following: 

• We choose an element x\ in Si (5*1 ) in the model Ms' built for 
T s i (we know there exists one such x\ because Clnf(Si) > 0) 

• Analogously, we choose an element X2 in a 2 (S 2 ) m the model 
M S ' build for 7^/ 

• We choose the bijections oq x and cq 2 in step 3 such that 
c<3i(#i) = crQ 2 (x2).m 

LEMMA 19 (Non-empty intersection (2)). Let Si and S 2 be nodes 
in a weakly consistent i-tree T is such that 

Clnf(Si) > A Si A S[ A S[ G Qi A 
Clnf(Sa) > A S 2 A S' 2 A S 2 G Q 2 

for some S, S[, S 2 G S, Qi, Q2 G Split(S) vv/jere Qi / Q 2 , a«d 
Qi C C, Q2 C C for some C G Comp(S) w;YA the property 

CSup(S) < E(CSup[C]). 

Then there exists a model (A, a, H) /«r T such that 

a(Si)na(S 2 )^0. 

Proof. We use the construction described in the proof of 
Lemma 6, except for the step 1 of the construction of the model 
Ms for the subtree Ts with root S, for which we do the following. 
From K(S) < CSup(S) < E(CSup[Q]), we conclude that the 
choice of cardinalities £q for Q G C in the proof of Lemma 6 is 
such that YiQizctQ > K(S), by considering two cases. 

1. There exists Q G C such that 

t Q = min(K(S), E(CSup[Q])) = K(S). 

Then we choose Q; G {Qi,Q 2 } such that Q; 7^ Q. Since 
Clnf(Si) > A Si A S[ ~> S, repeatedly applying C 4 and 
using C5, we have 

• E(CSup[Q;]) > CSup(S-) > Clnf(S0 > Clnf(S;) > 

• K{S) > Clnf(S) > Clnf(S0 > 

As a consequence tQ i = min(A'(S), S(CSup[QJ)) > and 
Sqectq >iQ+t Q , > K (S). 

2. For all Q G C, t q = E(CSup[Q])). Then E QeC £Q = 
E(CSup[C*]) > CSup(S) > K{S). 

Because T.Q e ctq > K(S), we can apply Remark 2 and choose 
one element Xi in ai(Si) in the model Ms' built for T s i , and 
an element x 2 in «2(S 2 ) in the model built for T s i , and decide 
to merge these elements in the step 1 of the construction. We 
know that such elements xi,x 2 exist because Clnf(Si) > and 
Clnf(S 2 ) >0.b 

LEMMA 20 (Isolated element). Let T be a weakly consistent i- 
tree and (Qi, -**) a subtree of (S, ~*) with the same root Sr as 
T, such that for all S G Q\ all the following conditions hold: 

1. Clnf(S) > 

2. VC* G Comp(S). 3 =1 Q G Split(S). Q C C A \Q n Qi| = 1 
J. VQeSplit(S). |QnQi|^l ^ 

(|Q n Qi| = A E(Clnf[Q]) < Clnf(S)). 

Then we can construct a model for T such that 

3x G a(R). VS G S. (x € a(S) <& S G Qi) 



Proof. We use a variation of the construction in the proof of 
Lemma 6. We apply the assumptions about the subtree Q\ to show 
that we always have enough "slack" to avoid merging one specific 
element from Q\ with the elements of neighbors. We being by 
describing a slightly modified Step 1 of the proof of Lemma 6. 
Step 1' (for nodes of Qi). Consider the root Sr G Qi. Let 
C G Comp(Sfl), and C C Split(S fl ) such that C = UC. By 
construction, there exists a unique son S' G Qi of Sr and a 
corresponding split view Q' such that S' G Q' and C = {Q'} til Co. 
We define iQ/ = min(fc, E(CSup[Q'])) and for each Qo G Co, we 

de f 

define tQ = min(fc — 1, E(CSup[Qo]))- For each Qo G Co, since 
k > Clnf(Sfl) by choice of k and Clnf(S«) > E(Clnf[Q„]) by 
hypothesis 3 on T, we have E(Clnf [Qo]) < fc— 1, so 

E(Clnf[Q ]) < t Qo < E(CSup[Q ]) (HI) 

and the property (HI) also holds for Iqi, because tQi is defined as 
in Lemma 6. 

By definition of t we clearly have maxQgc tQ < k. We next 

show E tQ > k by considering the following cases. 

Qec 

• tQ/ = k. Then the claim is obvious. 

• For all Q G C we have tQ = E(CSup[Q]). The claim follows 
from C3 and the choice of k because E£q > E(CSup[C]) > 
CSup(S fl ) > fc. 

• There exists Qo G Co such that tQ — fc — 1. Using 
E(CSup[Q']) > CSup(S') > and k > Clnf(S fl ) > 0, 
we obtain Iqi > 0, so 



E t Q > t Qo + tc 

Qec 



> (fc- 1) + 1 >fc- 



We finally obtain 



max in < k < E to 

Qec ^ - - qsc w 



(H2) 



By definition of all £q we then have 

VQo G C . t Qo < k (H3) 

According to Remark 2, H3 allows us to choose an element of the 
model Ms' constructed for the subtree 7^/ and decide not to merge 
it with any other element. This observation allows us to recursively 
enforce x G a(S) <=> S G Qi. 

Indeed, consider a node Sr G Qi and let {S R , . . . , S R } — 
Sons(Sij) PI Qi be its sons in Qi. For each i, we can then recur- 
sively ensure Xi G tt(S) <=> S G Qi for each S in the S R 
subtree i.e. for each S for which S~> Sr. By definition of Qi, 
each Sr is in a different complete view, so we can apply bijection 
to the submodels (Remark 4) and let a (x 1) = ... = a p (x p ) = x. 
We ensure that x does not belong to any subtree rooted at a node 
So G Sons(Sij) \ Qi, using Remark 2 to make sure that x is not 
merged with any of the elements of a(So), which is possible thanks 
to H3. Finally, for the base case, when S has no sons, we pick x to 
be a fresh element, which is possible by assumption 3 on the sub- 
tree Qi, as noted in Remark 3. ■ 

A.4.2 Links between weak and strong consistency 

Lemma 8 (Bounds Refinement) Let T be a strongly consistent 
i-tree, S G S, i,s such that Clnf(S) < i < $ < CSup(S), let 
T = T[Clnf(S) <- i, CSup(S) <- s] and% v = RnfCH- Then 
1) T N ' F ^l d , 2) T N ' F \= T, and 3) j/-,(S A S ), then 

(Clnf(S ),CSup(S )) = (Clnf' NF (So),CSup' NF (S )). 

Proof. 

1 . We prove this result by induction on the depth of S in the tree 
(S, ~-»). The key step of this proof is to show that the application 
of UpSup and/or Uplnf to the father S' of S do not produce 
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a situation where a$ holds in the resulting diagram T" (and 
therefore the rule Error is not applicable in T"). We distinguish 
three different cases : 

• When UpSup and Uplnf are both applicable to this node 
5" we have Clnf"(S") < CSup"(S") because for C G 
Comp(S'), Q G Split(S') such that Q C C, Q = {S} i+i 
Qo, C = {S\ a Co and T > ^ s Jj$ T " we have 

UpSup Uplnf 

Clnf'(S') = E(Clnf'[Q ]) + Clnf' (S) (by 64) 

= E(Clnf [Qo]) + CSup'(S) (i < s) 

<E(CSup'[Qo]) + CSup'(S) (byC 5 ) 

< E(CSup'[C ]) + CSup'(S) (Qo C Co) 

= CSup"(S') (by 63) 

• When only UpSup is applicable to this node S' we have 
Clnf"(S") < CSup"(S') because for C G Comp(S"), 

C = {S} 1+1 Co and T ^5 T" we have 

UpSup 

CSup"(S') = E(CSup'[C ]) + CSup'(S) (by 63) 

> E(CSup'[Q ]) + Clnf (5) (i < s) 

>Clnf'(S') (byC 6 ) 
= Clnf" (S') 

• When only Uplnf is applicable to this node S' we have 
Clnf"(S") < CSup"(S') because for Q G Split(S'), 

Q = {S}W Qo and T' ^5 T" we have 

Uplnf 

Clnf"(S') = E(Clnf'[Q ]) + Clnf'(S) (byb 4 ) 

<E(Clnf'[Q ]) + CSup'(S) (i < s) 

<CSup'(5") (byC 7 ) 
= CSup"(S') 

2. Follows easily from the hypothesis Clnf(S) < i < s < 
CSup(S) and the fact that Rj5J F is semantics preserving. 

3. It is enough to notice that only rules Uplnf and UpSup are used 
when applying R^p, and these rules are applied in the bottom- 
up direction. ■ 

Lemma 9 (Parallel Bounds Refinement (1)) Let T be a strongly 
consistent i-tree, and (Qo, •**) a subtree ofT which has the same 
root as T and is such that 

• The nodes ofQo are pairwise independent, that is 

Vft,5 , 2 GQo.-(disj^(S'i,&)) 

Then the i-tree T defined by 

7" d = T [ VS G Q :Clnf (S")^CSup(S) ] 

is such that his 7V" normal form T NF = R^ F ('7" ) satisfies 

1- %, +u 

2- T N ' F (= T 

3. \/S G Qo. VQ G Split' NF (S). Q n Qo = =* 
E(Clnf' NF [Q]) < Clnf' NF (S) 

Proof. If we apply [Clnf (S")<-CSup(S')] to every node S' of Qo 
starting from the root to the leaves, we always maintain C\ , C2 , C3 
because $ and CSup are never modified. We also maintain C4 
because for each S G Qo and each view Q G Split(5) such that 
there exists 5' G Q n Qo, by ^disJ5-(Si, S2), we know that S' is 
the only modified node, and 

Clnf'(S) = CSup(S) 

>E(Clnf[Q -{£'}]) + CSup(S") (by C 7 ) 
= E(Clnf'[Q]) 



Therefore, T is already in normal form, so T^ F is identical to T' 
and is clearly distinct from ±d, proving condition 1. Condition 2 
holds because T' \= T because the cardinality bounds in T are at 
least as strong as in T. Condition 3 holds because 



E(Clnf'[Q]) = E(Clnf[Q]) 

<CSup(5) 
= Clnf'(S') 



(because Q n Qo = 0) 

(by C 8 ) 

(by definition of T') 



LEMMA 2 1 (Parallel Bounds Refinement (2)). Let T be a strongly 
consistent i-tree and Si , S2 , Si , S2 , S € S and C, Q such that 

C G Comp(S), Qi, Q 2 G Split(S) 

Si A si A S[ G Qi A Qi C C 
S 2 ^ S 2 A S 2 G Q 2 A Q 2 C C 
Q1/Q2 

T' Wr[VS',SiAs'Asi : Clnf(S') <- CSup(S')] 
[V5',5 2 AS'AS^ : Clnf(S') <- CSup(S')] 

2 NF — ^NFK 2 ) 

T" WRK F (T N ' F [CSup"(S) ^ Clnf' NF (S)]) 
taT" ^_L d , T" |= T, and CSup" (S) < E(CSup"[C]). 

Proof. As in the proof of Lemma 9, weak consistency con- 
ditions hold in T' for all nodes in S' such that Si —^ S[ or 
S2~^ S'2- Therefore, the only rewrite step that mat be applicable 
in T is the application of Uplnf to S. This application may 
lead to applications of other instances of Uplnf, but the proof 
of Lemma 8 shows that this process will result in a weakly con- 
sistent i-tree, so T^p 7^_l_d- Moreover, the process of comput- 
ing R^ F (T N ' F [CSup"(S) <- Clnf' NF (S)]) is identical to applying 
Lemma 8 to T' with bounds i = s — Clnf' NF (S), and therefore 
leads to a weakly consistent i-tree T", so T" j^-Ld- 

The condition T" \= T follows because H.^ F is semantics- 
preserving, and the updates of trees only shrink the bounds on 
nodes, so they convert a diagram into a stronger one. 

To prove CSup"(S) < E(CSup"[C]), observe first 
that CSup"(S) = Clnf' NF (S) by definition of T", and 
E(CSup"[C]) = E(CSup[C]) because CSup does not change for 
any ancestors of S. Therefore, it suffices to show 

Clnf' NF (S) < E(CSup[C]) 

We prove this condition by distinguishing two cases. 

1. Uplnf is not applicable to S. Then Clnf' NF (S) = Clnf (S) and 
the condition follows by Cg. 

2. Uplnf is applicable to S. Then for some a, b where {a, b} = 
{1, 2} we have 

Clnf' NF (S) = E(Clnf'[Q a ]) 

< E(Clnf'[Q a ]) + E(Clnf'[Q 6 ]) 

< E(Clnf'[C]) 

< E(CSup[C]) 



A.4.3 Completeness of the algorithm Subsumes 

Theorem 4 Let T be a strongly consistent i-tree and let H^ for 

atomic formula A be as defined in Figure 12. Then H^ if and only 

ifT^A. 

The (=>) direction of Theorem 4 is trivial by the semantics of i- 

diagrams. For (<=) direction we prove the following characteriza- 
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tions: 



S^0 d 

ke [Clnf(S),CSup(S)j 

-n(Included(S , C,T)) 

Si + d A -(Si A S 2 ) 

{Si,S2}A^d\s)* T (Si,S 2 ) 
+P £ $(S) 
-Pg${S) 



3M. a(S) # 
3.M. |a(S)| = fc 
3.M. a(S ) g U«[C] 
3.M. a(Si) ga(S 2 ) 
3M. q(Si) ^a(S 2 ) 
3.M. a(Si)na(S 2 )/ 
3M. a(S) g 3(P) 
3M. a(S) g H(P) C 



where .M denotes a model A4 = (A, a, 3) of T. We next present 
the remaining lemmas that prove these characterizations (see also 
Section 7). 

Lemma 12 if an i-free T w strongly consistent, So, £ §, C 6 
P(§), and Included(So, C, T) returns false, frten 



3.M. a(S ) g |Ja[C] 



Proof. Let Qo be the smallest set of nodes such that: 

So ■$* S ^-S e Qo 

S € Qo A Qi € Comp(S)A 1 „ ^_ „ 
SiGQiA -Incl(Si , C) J ^^ G « ° 

By definition of Incl we know that Q DC — 0. 

Qo is tree-shaped by construction, but may contain two nodes 
which are explicitly disjoint. We compute a subtree Qi of Qo, 
by starting from the root and keeping each time at most one son 
for each complete view. We also impose that Qi contains S, by 
avoiding to cut the branch which leads to S. 

We then define 

7" = RnfCHVS € Qi = Clnf(S)^CSup(S)]) 
According to Lemma 9 (Parallel Bounds Refinement) we have 

We then apply Lemma 20 to construct a model for the weakly 
consistent i-tree T such that 

vs' es. (xea(S') <* S' e Qi) 

for some element x G a(S). Because S £ Qi, we have x G a(S). 
Because C C\Q\ — 0, we have a; ^ f) a[C], m 

Lemma 15 If an i-tree T is strongly consistent, then for all S G § 
and P G P N we nave 

(+P) £ $(S) => 3M. M (= T A aAi(S) g S(P) 
(-P) £ $(S) => 3X. 7W (= T A a M (S) g H(P) C 

Proo/ Let S G S be such that (+P) £ $(S). We define 

Q +P d ={S' G S|(+P) G *(S')}- Using Cio we show that 
Included(S, Q+p, T) cannot return true. Then, there exists a 
model such that a(S) g (IJ a[Q+p]). We then change the model 
by redefining S' by: 

VP G PN. 3'(P) = \J{a{S')\S' G Q+p} 

to ensure that 3'(P) g a(S). The case of (-P) £ $(S) is 
analogous by taking a model such that a(S) g ([Jq[Q_p]) and 
redefining 3' by: 

VP G PN. 3'(P) = A - \J{a(S')\S' G Q- P } 



Proof. Let Si,S 2 € S \ {0d} such that -.(disj^(Si, S 2 )). If 
Si = S2 we can find a model .M where a(Si) = a(S 2 ) 7^ by 
Lemma 10, in this model a (Si) l~l a(S 2 ) = a(S 2 ) 7^ 0. Suppose 
Si 7^ S 2 . Define Si, S 2 , So as the unique nodes such that So is the 
least common ancestor of Si and S 2 in T, and S[,S' 2 G Sons(So) 
are the ancestors of Si and S 2 , respectively. We distinguish two 
cases: 

• Si and S 2 do not belong to a same complete view of So . Then 
apply Lemma 9 to the subtree 

Qo d ={S G S I Si -A S V S 2 A S} 

whose nodes are pairwise independent by the hypothesis 
disJ5-(Si, S 2 ). The resulting tree 7^' F satisfies the hypothesis 
of Lemma 18, so there exists a model M = (A, a, 3) for 7^' F 
such that 

a(Si)na(S 2 )^0. 
.M is also a model of T because 7^ F |= T. 

• Si and S 2 belong to a same complete view C. Define 

T d =RK F (T[VS', Si A 5' A Si : Clnf(S') <- CSup(S')] 
[VS',S 2 AS'AS 2 : Clnf(S")<-CSup(S")]) 
T" = / R;j F (T'[CSup"(S) <- Clnf'(S)] 

By Lemma 21, then T" |= Tand CSup"(S) < E(CSup"[C]). 
This last property allows us to apply Lemma 19 and prove 
the existence of a model (A, a, 3) for T such that a (Si) Pi 
a(S 2 )^0.. 



Lemma 16 If an i-tree T is strongly consistent, then for all 

Si , S'z G § such that Si 7^ 0, S 2 7^ we have 

-n(disj^(Si, S 2 )) => 3M. a(Si) n a(S 2 ) + 0. 
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